Real-Time Prediction of Transarterial Drug Delivery Based on a Deep Convolutional Neural Network

: This study develops a data-driven reduced-order model based on a deep convolutional neural network (CNN) for real-time and accurate prediction of the drug trajectory and concentration ﬁeld in transarterial chemoembolization therapy to assist in directing the drug to the tumor site. The convolutional and deconvoluational layers are used as the encoder and the decoder, respectively. The input of the network model is designed to contain the information of drug injection location and the blood vessel geometry and the output consists of the drug trajectory and the concentration ﬁeld. We studied drug delivery in two-dimensional straight, bifurcated blood vessels and the human hepatic artery system and showed that the proposed model can quickly and accurately predict the spatial–temporal drug concentration ﬁeld. For the human hepatic artery system, the most complex case, the average prediction accuracy was 99.9% compared with the CFD prediction. Further, the prediction time for each concentration ﬁeld was less than 0.07 s, which is four orders faster than the corresponding CFD simulation. The high performance, accuracy and speed of the CNN model shows the potential for effectively assisting physicians in directing chemoembolization drugs to tumor-bearing segments, thus improving its efﬁcacy in real-time.


Introduction
Hepatocellular carcinoma (HCC) is one of the cancers with the greatest morbidity and mortality rates worldwide [1][2][3]. For patients with unresectable HCC, transarterial chemoembolization (TACE) has been recognized as the preferred treatment option [3]. However, embolic and chemotherapeutic agents often have negative effects on non-tumor sites due to the delivery not being directed to the target site: chemotherapeutic agents destroy healthy tissues and embolic agents block off arteries that should not be occluded. The drug trajectory and concentration distribution in hepatic artery therapeutics are affected by clinical parameters, such as the injection position, injection speed and injection angle [4][5][6][7]. If physicians could be provided advance, or real time, knowledge of the optimal clinical parameters or the drug trajectory and concentration field, it would be possible to accurately deliver the therapeutic drugs to the tumor site with the guidance of medical imaging, thereby reducing the damage to healthy tissues and improving treatment efficacy.
Compared to experimental methods, numerical simulation has been more widely used in drug release studies due to the reduced cost [8]. Therefore, many researchers attempted to evaluate the impact of clinical parameters on the transarterial drug trajectory in interventional treatment with numerical simulations. Qiu [9] investigated the effect of the relative position of the catheter spout and the liquid drug injection angle on the drug perfusion rate in vascular interventional treatment. However, only idealized bifurcated blood vessel geometry and limited numbers of injection positions and angles were examined. Basciano et al. [10] developed a computational model of a representative hepatic artery system to study the influence of the particle release position and the temporal release window on the velocity field and the particle trajectory for Yttrium-90 (90Y) radioembolization treatment. The three-pulse simulation took approximately 72.87 h. To compare the impact of the microagent quantity and size, catheter-tip position on the trajectory of the microagent, Aramburu et al. [11] performed numerical simulations to mimic liver radioembolization in terms of delivering particles to tumor-bearing segments and concluded that the catheter-tip position has the most important effect on the trajectories of the microagent. While these studies provide significant insight on the influence of clinical parameters on drug delivery in the human hepatic artery, the computational cost is huge due to the complex geometry and nonlinearity of the flow problem.
Over the past decade, a considerable amount of literature has been published on machine learning (ML) and deep learning (DL), due to their ability to derive powerful reduced-order models (ROM) to address large computational and highly nonlinear problems. In particular, convolutional neural networks (CNNs), as one of the representative algorithms of deep learning, can perform stable and efficient learning of lattice-point features (such as pixels) with small computational resources and have been widely used in image processing [12][13][14]. In numerical simulation, the physical field often appears in the form of a spatial distribution, similar to a digital image. Therefore, CNNs have the potential to provide accurate ROMs for physical field prediction.
Recently, a series of papers demonstrated the above point by investigating a CNNbased ROM to predict the flow field over cylinders and airfoils. Han et al. [15] proposed a hybrid deep neural network to predict the flow field over a cylinder and an airfoil at future times based on the flow field at previous times with good agreement with numerical simulations. Peng et al. [16] showed that a CNN-based ROM can achieve even greater performance than the network model with a fully connected decoder. By introducing the angle of attack as an additional input [17], they were able to maintain high accuracy for arbitrary angle of attack. They further developed a geometry-and boundary-conditionadaptive CNN-based ROM for fast flow field prediction [18]. In the latter work, the average prediction error dropped to 6% and the prediction speed was two orders of magnitude faster than the numerical simulation. Few studies related to the field reconstruction in channels are available. Liu et al. [19] proposed a CNN network to predict the flow field in a 2D micro channel, reporting good accuracy and fast computational speed of the model.
In the medical industry, CNN has also reported several very striking research results. Liu S et al. [20] was the first to propose the use of CNN for drug-drug interaction (DDI) extraction and demonstrated superiority over other DDI extraction methods. Subsequent scholars [21,22] conducted more in-depth exploration and improved the accuracy of drugdrug interaction extraction. Hui Teng [23] established a gray relational analysis (GRA) and related drug evaluation system for mental illness based on deep CNN, which combined with the medical knowledge base, analyzed the disease results and provided data theory to guide the follow-up treatment of depression. Meyer et al. [24] used chemical images with CNNs and molecular fingerprints with random forests and chemical-structure-derived molecular representations to predict specific medical subheading (MeSH) "therapeutic uses" classes that do not require empirical measurements of effects of chemicals, hence broadly apply to predict drug classes after training.
In this study, we apply CNN-based ROM for real-time prediction of the drug trajectory and concentration field with various injection positions. According to our literature review, the CNN-based ROM has not been applied to the real-time spatial drug distribution prediction in vascular interventional treatment. Furthermore, we propose a new network-encoding architecture, establishing two convolutional paths avoid interference between two respective inputs: injection site and blood vessel geometry, thereby enhancing its learning efficiency and improving model performance. Next, we present the proposed ROM, including the data preparation and the architecture of the deep neural network. We then introduce the spatial-temporal drug trajectory and concentration prediction results of three types of blood vessels with various injection sites. Finally, we state our conclusions in the last section.

Methods
In this section, we present in detail the architecture of the proposed ROM based on deep CNN, including the data preparation, the architecture of CNNs and the model training.

Physical Model and Numerical Simulation
Our work considers two idealized geometries of straight and bifurcated blood vessels and a third case representing a human hepatic artery system. (See Figure 1a-c.) All the cases are with circular cross-section with centers in the symmetry plane. The drug is injected at the intersection of inlet section and the symmetry plane with uniform injection velocity and direction. Therefore, we only simulate the domain in the symmetry plane, reducing the 3D problem to 2D for economy of computational cost. The information of the circumferential component is thereby ignored, which has much less influence than that of axial and radial components. The geometry of the human hepatic artery system used in the current work is created from one of the most common anatomical configurations [10] and has been investigated in several studies [25][26][27]. The geometry parameters are derived from the published articles of Ishigami et al. [28] and Han et al. [29]. Appl  We assume that the drug transfer in the blood does not involve any physiological chemical processes and the mass of the liquid drug is mainly transferred by convection, as the influence of diffusion is relatively weak [9,33,34]. The scalar transport model is used to represent the drug concentration field: where s is the volumetric concentration of the liquid drug. The velocity waveform [10] is presented in subfigure of Figure 1c, where T represents the cardiac cycle and t = 0.1 T is the start of injection time, with the injection lasting 2 T. This model was proposed by Buchanan et al. [35] and applied to the similar study of targeted injection [11,36,37].
The PISO algorithm, one of the pressure-velocity coupling methods, is used and the We consider blood to be an incompressible Newtonian fluid, and solve the following continuity and linear momentum equations: where u f (m/s) is the velocity of blood flow, t (s) is the time, p (Pa) is the pressure, ρ (kg/m 3 ) is the density of blood, f (N) is the body force per unit volume and τ (Pa) is the shear stress tensor. The main diameter of the vessel is 6 mm. The mean velocity was 0.26 m/s, corresponding to a Reynold's number of 479.3. Newtonian fluid assumption is used in this study referring to the literature [30][31][32], as the influence of non-Newtonian behavior flow is not significant in large arteries. We assume that the drug transfer in the blood does not involve any physiological chemical processes and the mass of the liquid drug is mainly transferred by convection, as the influence of diffusion is relatively weak [9,33,34]. The scalar transport model is used to represent the drug concentration field: where s is the volumetric concentration of the liquid drug. The velocity waveform [10] is presented in subfigure of Figure 1c, where T represents the cardiac cycle and t = 0.1 T is the start of injection time, with the injection lasting 2 T. This model was proposed by Buchanan et al. [35] and applied to the similar study of targeted injection [11,36,37].
The PISO algorithm, one of the pressure-velocity coupling methods, is used and the IcoFoam solver in OpenFOAM is modified by embedding the above drug concentration control equation for numerical simulation.
ICEM CFD is used to generate the mesh of the computational domains for the three types of blood vessels. The length of the grid in the hepatic artery is set to 0.005 times the inner diameter and we use a refined mesh in both the high curvature areas (bifurcated part) and the near-wall areas. The maximum and minimum distances between the first layer grid and the wall are 0.0133 mm and 0.0089 mm, respectively. The total number of grids is 143,730. Figure 2 illustrates the computational mesh near the RHA and LHA bifurcation.

Data Preprocessing
We trained the three types of blood vessels separately and set 113 different injection positions uniformly distributed along the inlet cross-section for each type of blood vessel. The result of the numerical simulation for each injection position combined with the corresponding geometry of blood vessels to form one data point, with a total of 113 data points. These were randomly divided into 77 training data, 16 validation data and 20 test data. The mesh generation and the numerical simulation process for each datum were the same for each type of blood vessel. The injection locations in the test and validation data never appeared in training or verifying the generalization ability of the CNN model. It is worth mentioning that each data point includes the concentration fields at different times during the injection process and when performing model training, we extracted the numerical simulation results of the target moment to generate sample data. To highlight the features of the raw data and enhance the sensitivity of the neural network to data features, we preprocessed the generated dataset. The max-min normalization method was used to

Data Preprocessing
We trained the three types of blood vessels separately and set 113 different injection positions uniformly distributed along the inlet cross-section for each type of blood vessel. The result of the numerical simulation for each injection position combined with the corresponding geometry of blood vessels to form one data point, with a total of 113 data points. These were randomly divided into 77 training data, 16 validation data and 20 test data. The mesh generation and the numerical simulation process for each datum were the same for each type of blood vessel. The injection locations in the test and validation data never appeared in training or verifying the generalization ability of the CNN model. It is worth mentioning that each data point includes the concentration fields at different times during the injection process and when performing model training, we extracted the numerical simulation results of the target moment to generate sample data. To highlight the features of the raw data and enhance the sensitivity of the neural network to data features, we preprocessed the generated dataset. The max-min normalization method was used to speed up the computational time while effectively improving accuracy.

Injection Position and Geometry Representation
We investigated SDF and binary image methods to represent the injection position and the blood vessel geometry. With SDF, the geometry of the blood vessels and the injection location were represented as a level-set function in the study area [38,39]. Due to the simple geometry of the straight blood vessel, the drug concentration field could be accurately predicted using only a binary image of the injection location and geometry representation, which is transformed into a matrix of 120 × 120, shown in Figure 3a. We use a horizontal line through the injection site to indicate the injection location. The corresponding injected artery is presented in Figure 3b. The distance between the injection site and the lower artery wall is 5.5 cm. The inputs of the bifurcated blood vessels are the presented injection position and SDF geometry, as shown in Figure 3c,d. Unlike the case of the straight vessel, the value of the computational domain is set to represent the injection location, which is equal to the distance between the lower wall and the injection site. For the human hepatic artery system, in addition to the input strategy of the bifurcated blood vessel, we also explored another approach where SDF was used to represent both the injection location and the artery system geometry, as shown in Figure 3e,f. The former displays the SDF presented injection location, where the points on the horizontal line through the injection site are set to zero, then the values of other spatial points are set to be the nearest distance to the horizontal line. Figure 3f illustrates the SDF presented hepatic artery system, which we set to zero at the artery wall and the outlets. The SDF value of the region outside of the vascular geometry is masked, which allows the network to further focus on the drug distribution field within the blood vessels. The latter method achieves better prediction accuracy; for more detail, please refer to Section 3.2.1. For easy representation, the geometric horizontal and vertical coordinates of the three blood vessel types are all scaled to [0,1]. Appl

Architecture of CNNs
The proposed multi-layer network consisted of an encoding part and a decoding part. In the encoding part, convolutional layers extract the high-feature map [40] from the injection position and blood vessel geometry; in the decoding part, the highly coded information is decoded through multi-layer deconvolutions [41] to generate a drug concen-

Architecture of CNNs
The proposed multi-layer network consisted of an encoding part and a decoding part. In the encoding part, convolutional layers extract the high-feature map [40] from the injection position and blood vessel geometry; in the decoding part, the highly coded information is decoded through multi-layer deconvolutions [41] to generate a drug concentration field. For the bifurcated blood vessel and the human hepatic artery, we have two images as input, respectively, representing the injection position and the blood vessel geometry. Therefore, we propose two encoding network structures to receive the input. In Model 1, the artery geometry and the injection location are simultaneously fed into the unique path in the encoding part. In Model 2, the artery geometry and the injection location are fed into different convolutional paths in the encoding part and then the extracted features are stacked and fed into the decoding layers. After conducting a model analysis, we elected to adopt Model 1 for the bifurcated blood vessel and Model 2 for the human hepatic artery. Figure 4 illustrates the architecture of the two network models, in which the matrix shape for each layer of the network models is presented.

Model Training
Model training is an iterative process, where the difference between the target ( ) and the prediction ( ) are minimized. The difference is generally in the form of a loss function: where and , respectively, represent the blood vessel geometry and the injection position of the n-th blood vessel.
, refers to the numerical simulation results while , stands for the network prediction results. W denotes all the weights of the network layers; is the regularization coefficient; ‖ ‖ is the L2 regularization term for preventing model overfitting. Compared with average relative error, L2 is more sensitive to outliers and more conducive to neural networks to capture the variations in minor features in data.
The Adaptive Moment Estimation (ADAM) optimization algorithm is proposed [42] and the mini-batch strategy is adopted in the optimization process. The hyperparameters of the network training optimization algorithm are listed in Table 1. Table 1. Hyperparameters of the optimization algorithm: is the regularization coefficient. , and are the hyperparameters of the ADAM algorithm, denotes the learning rete.

Model Training
Model training is an iterative process, where the difference between the target (ψ) and the prediction (ψ) are minimized. The difference is generally in the form of a loss function: where s n and d n , respectively, represent the blood vessel geometry and the injection position of the n-th blood vessel. ψ(s n , d n ) refers to the numerical simulation results whileψ(s n , d n ) stands for the network prediction results. W denotes all the weights of the network layers; λ is the regularization coefficient; λW 2 is the L2 regularization term for preventing model overfitting. Compared with average relative error, L2 is more sensitive to outliers and more conducive to neural networks to capture the variations in minor features in data. The Adaptive Moment Estimation (ADAM) optimization algorithm is proposed [42] and the mini-batch strategy is adopted in the optimization process. The hyperparameters of the network training optimization algorithm are listed in Table 1.

Results
The prediction results of the network model are presented in Section 3.1, including the blood vessel with idealized geometry and the human hepatic arterial system. Then, a computational time comparison between the network prediction and numerical simulation is carried out. Furthermore, we investigate the influence of the network's hyperparameter in Section 3.2.

Prediction
Results of the Network Model 3.1.1. Blood Vessels with Idealized Geometry Figure 5 shows the prediction results of the drug concentration distribution at four different instants, 0.11 (t/T), 0.12 (t/T), 2.11 (t/T), 2.13 (t/T), which correspond to the times when the injection starts, stabilizes and ends and after the injection. The drug is injected from the central axis of the blood vessels. The first, second and third lines show the network prediction, the numerical simulation and the relative error between the two, respectively. As a reminder, we scale the length and height of the blood vessel, as well as the concentration field to [0,1]. As presented, the liquid drug flows along the central axis and the relative errors at the four different times are all less than 7%. After training, the accuracy of the proposed model, which is measured by the average relative error, could achieve 97.1%-which is satisfactory to meet the requirements of physician-assisted procedures in real time. Appl computational time comparison between the network prediction and numerical simulation is carried out. Furthermore, we investigate the influence of the network's hyperparameter in Section 3.2.

Prediction Results of the Network Model
3.1.1. Blood Vessels with Idealized Geometry Figure 5 shows the prediction results of the drug concentration distribution at four different instants, 0.11 (t/T), 0.12 (t/T), 2.11 (t/T), 2.13 (t/T), which correspond to the times when the injection starts, stabilizes and ends and after the injection. The drug is injected from the central axis of the blood vessels. The first, second and third lines show the network prediction, the numerical simulation and the relative error between the two, respectively. As a reminder, we scale the length and height of the blood vessel, as well as the concentration field to [0,1]. As presented, the liquid drug flows along the central axis and the relative errors at the four different times are all less than 7%. After training, the accuracy of the proposed model, which is measured by the average relative error, could achieve 97.1%-which is satisfactory to meet the requirements of physician-assisted procedures in real time.     Figure 6 displays the prediction results of the concentration field in the bifurcated blood vessel for four different injection positions at t = 2.1 (t/T) in the test dataset. The distances between the injection position and the central axis are 0.8 R, 0.7 R, 0.55 R and −0.14 R, respectively. The figure clearly shows that all of the liquid drug flows into the branch with the 0.8 R injection position, while all the drug remains in the parent artery with the −0.14 R injection position. In the other two cases, the drug is divided into two parts at the bifurcation. The numerical simulation and prediction results are in good agreement, which confirms that the proposed model has the ability to capture minor features, such as drug shunting and reflux. The average accuracy of the prediction results achieves 99.4%, the maximum error is less than 6% mostly in the region with greatest drug concentration.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 9 of Figure 6. Drug concentration field and error distribution in the bifurcated vessel: (a-d) are respe tively 0.8 R, 0.7 R, 0.55 R, −0.14 R from the central axis of the blood vessel.

The Human Hepatic Arterial System
Next, the network model was constructed for the hepatic artery system, buildin upon our previous study of the straight and bifurcated blood vessels. First, we present th prediction results at time t = 2.1 (t/T), for different injection locations, to investigate th prediction ability of the network model on different drug trajectories in this more comple flow geometry. Three representative concentration fields in the test dataset are presente in Figure 7, for three injection site locations: 0.2 R, −0.36 R and −0.8 R, measured from th central axis of the main vessel. As can be seen from the figure, the distribution of liqui drug into the different branches is highly dependent on the injection location. The erro cloud map is unified to the same error range for the convenience of observation indicatin the maximum relative errors of all computational domains are less than 6%. The greate error occurs locally at very few points, which results in an extremely high average accu racy, more than 99.9% for all the three data points. Furthermore, Figure 8A-E compare the concentration distribution of the numerical simulation and the network prediction with three various injection sites in Figure 7 at specified cross-sections, the schematic o the sections being presented in Figure 8F. Figure 8G compares the statistics of the tot

The Human Hepatic Arterial System
Next, the network model was constructed for the hepatic artery system, building upon our previous study of the straight and bifurcated blood vessels. First, we present the prediction results at time t = 2.1 (t/T), for different injection locations, to investigate the prediction ability of the network model on different drug trajectories in this more complex flow geometry. Three representative concentration fields in the test dataset are presented in Figure 7, for three injection site locations: 0.2 R, −0.36 R and −0.8 R, measured from the central axis of the main vessel. As can be seen from the figure, the distribution of liquid drug into the different branches is highly dependent on the injection location. The error cloud map is unified to the same error range for the convenience of observation indicating the maximum relative errors of all computational domains are less than 6%. The greatest error occurs locally at very few points, which results in an extremely high average accuracy, more than 99.9% for all the three data points. Furthermore, Figure 8A-E compares the concentration distribution of the numerical simulation and the network prediction, with Appl. Sci. 2022, 12, 10554 9 of 16 three various injection sites in Figure 7 at specified cross-sections, the schematic of the sections being presented in Figure 8F. Figure 8G compares the statistics of the total amount of drug in the various cross-sections between the numerical simulation and the network prediction. As previously, it is deduced that the proposed CNN-based model accurately predicts the drug concentration field and the quantity of drug delivered into each branch of the hepatic artery. In transarterial chemoembolization therapy, the amount of drug that ultimately flows to the targeted branch is critical to successful treatment. Hence, the good agreement in Figure 8G demonstrates the significant potential of the proposed model for assisting in clinical TACE treatment.  Further studies are carried out to test the performance of the proposed model on the prediction of the concentration field at different times in the hepatic artery system. The three selected times are: t = 0.15 [t/T], t = 2.1 [t/T] and t = 2.175 [t/T], the distance between the injection site and the central axis of blood vessels being −0.25 R. It can be seen from Figure 9 that the maximum relative error is less than 2.5% and the average error is less than 0.025%. The area with high concentration shows a greater relative error and as the drug continues to spread, the concentration decreases and the accuracy of the prediction in the network model is more accurate. Further studies are carried out to test the performance of the proposed model on the prediction of the concentration field at different times in the hepatic artery system. The three selected times are: t = 0.15 t/T , t = 2.1 t/T and t = 2.175 t/T , the distance between the injection site and the central axis of blood vessels being −0.25 R. It can be seen from Figure 9 that the maximum relative error is less than 2.5% and the average error is less than 0.025%. The area with high concentration shows a greater relative error and as the drug continues to spread, the concentration decreases and the accuracy of the prediction in the network model is more accurate.

Computational Time for Network Prediction and Numerical Simulation
The time costs for both the numerical simulation and the network prediction are listed in Table 2. The second column is the time required for CNN model training, the third column is the prediction time and the fourth column is the time OpenFOAM takes

Computational Time for Network Prediction and Numerical Simulation
The time costs for both the numerical simulation and the network prediction are listed in Table 2. The second column is the time required for CNN model training, the third column is the prediction time and the fourth column is the time OpenFOAM takes to solve the two-pulse simulation of drug flow at any injection position. Although considerable time is needed for training, the network model prediction time is nearly four orders of magnitude faster than that obtained with numerical simulation. The prediction time of the proposed CNN-based model is less than 0.1 s, which can achieve the purpose of real-time assisting clinical treatment. It is worth noting that from the straight and bifurcated blood vessels to the human hepatic artery, the vascular geometry becomes more complex and, therefore, finer meshes are required to achieve convergence in the numerical simulations, resulting in increased time consumption. However, the network model does not have the same convergence issue and the prediction time remains almost constant as the geometry varies. Table 2. Time consumption of the network model prediction and numerical simulation.

Hyperparameter Study
In this section, we investigate the influence of several important factors on the proposed model: (1). the model structure and input representation; (2). the number of the layers of the proposed model; (3). the learning rate. We optimize the network by using the evaluation criterion of average error.

Influence of Model Structure and Input Presentation
Two networks are investigated to predict the concentration field of the injected human hepatic artery. The first model uses the binary image presented injection location and the SDF presented vascular geometry as input, which, together, are fed into the single convolutional path in the coding part and then encoding and decoding are successively performed to generate a drug concentration distribution, as shown in Figure 4a. The network prediction, numerical simulation and relative error of the concentration field at t = 2.1 (t/T) are summarized in Figure 10 for the case of injection site at −0.25 R. The maximum error of the first model on the prediction of the drug concentration field is less than 6%. Although the model performed well, greater accuracy desired for clinical translation. As presented in Figure 4b, the second network has two paths of convolutional layers in the coding part, where the injection position and blood vessel geometry are separate inputs for encoding and then the extracted features, once stacked, are input to the decoding layers. Moreover, SDF is employed to represent both the injection site and the blood vessel geometry, as presented in Figure 3. Comparing the prediction result of the first model in Figure 10 and of the second model in Figure 9 in the previous section, one can see that Model 2 performs better than Model 1, the average concentration field prediction error being about 0.012% and 0.038%, respectively. The probable reason is that when the injection position and blood vessel geometry are superimposed into the coding layer, they easily interfere with each other and there are fewer weights that the model can learn, while when the two inputs are convolved separately, the model can obtain more effective and learnable information.
Model 2 performs better than Model 1, the average concentration field prediction error being about 0.012% and 0.038%, respectively. The probable reason is that when the injection position and blood vessel geometry are superimposed into the coding layer, they easily interfere with each other and there are fewer weights that the model can learn, while when the two inputs are convolved separately, the model can obtain more effective and learnable information.

Influence of Number of Network Layers
It is widely accepted that the learning ability of the network can be enhanced by increasing the number of network layers. However, given the limitations of time and computing resources, it is desirable to find an optimal network depth during the model design process. Here, we studied five network depths (L = {2, 3, 4, 5, 6}) for the bifurcated blood vessel model with fixed epoch, and other hyperparameters held constant in the training.
The corresponding convergence history curves are provided in Figure 11. It is apparent that the validation loss decreases significantly as the number of network layers increases. It is particularly interesting to note that the loss is relatively large between the case of two and that of three layers. The results confirm that the deep neural network models, while being more time consuming, have stronger learning ability than shallow neural network models. Meanwhile, it is also known that as the number of network layers increases, it eventually reaches a point of diminishing returns. Therefore, when optimizing the neural network structure, it is necessary to balance the prediction accuracy with computational time cost. In doing so, we determined that six layers were appropriate for the current network structure.

Influence of Number of Network Layers
It is widely accepted that the learning ability of the network can be enhanced by increasing the number of network layers. However, given the limitations of time and computing resources, it is desirable to find an optimal network depth during the model design process. Here, we studied five network depths (L = {2, 3, 4, 5, 6}) for the bifurcated blood vessel model with fixed epoch, and other hyperparameters held constant in the training.
The corresponding convergence history curves are provided in Figure 11. It is apparent that the validation loss decreases significantly as the number of network layers increases. It is particularly interesting to note that the loss is relatively large between the case of two and that of three layers. The results confirm that the deep neural network models, while being more time consuming, have stronger learning ability than shallow neural network models. Meanwhile, it is also known that as the number of network layers increases, it eventually reaches a point of diminishing returns. Therefore, when optimizing the neural network structure, it is necessary to balance the prediction accuracy with computational time cost. In doing so, we determined that six layers were appropriate for the current network structure.

Influence of Learning Rate
The learning rate is one of the most important hyperparameters affecting the performance of the network model. The value of the learning rate represents the speed at which the weight of the network is updated. Although a small learning rate can ensure that the model does not pass over any local minima, it also requires more time to achieve convergence and it is easy to end up in local minima. Hence, we need an optimal value for the learning rate, which is often obtained through coarse and fine tuning. The optimization process of the learning rate for the bifurcated blood vessel geometry is shown in Figure  12. First, in the coarse-tuning stage, we studied the four learning rates: 1 × 10 −3 , 1 × 10 −4 , 1 × 10 −5 , 1 × 10 −6 , as shown in Figure 12a. It can be observed that larger learning rates cause oscillations of losses and, as the learning rate decreases, the convergence of losses becomes smoother. Nevertheless, a small learning rate leads to a low speed of convergence and

Influence of Learning Rate
The learning rate is one of the most important hyperparameters affecting the performance of the network model. The value of the learning rate represents the speed at which the weight of the network is updated. Although a small learning rate can ensure that the model does not pass over any local minima, it also requires more time to achieve convergence and it is easy to end up in local minima. Hence, we need an optimal value for the learning rate, which is often obtained through coarse and fine tuning. The optimization process of the learning rate for the bifurcated blood vessel geometry is shown in Figure 12. First, in the coarse-tuning stage, we studied the four learning rates: 1 × 10 −3 , 1 × 10 −4 , 1 × 10 −5 , 1 × 10 −6 , as shown in Figure 12a. It can be observed that larger learning rates cause oscillations of losses and, as the learning rate decreases, the convergence of losses becomes smoother. Nevertheless, a small learning rate leads to a low speed of convergence and then more time is needed to obtain the optimal solution. With 100 epochs, the learning rate of 1×10 −4 shows the least loss; accordingly, we explored six values around 1 × 10 −4 in the fine-tuning stage: 1 × 10 −3 , 8 × 10 −4 , 6 × 10 −4 , 4 × 10 −4 , 2 × 10 −4 , 1 × 10 −4 , 6 × 10 −5 . As shown in Figure 12b, the learning rate of 6 × 10 −4 results in the least loss for the proposed model. the weight of the network is updated. Although a small learning rate can ensure that the model does not pass over any local minima, it also requires more time to achieve convergence and it is easy to end up in local minima. Hence, we need an optimal value for the learning rate, which is often obtained through coarse and fine tuning. The optimization process of the learning rate for the bifurcated blood vessel geometry is shown in Figure  12. First, in the coarse-tuning stage, we studied the four learning rates: 1 × 10 −3 , 1 × 10 −4 , 1 × 10 −5 , 1 × 10 −6 , as shown in Figure 12a. It can be observed that larger learning rates cause oscillations of losses and, as the learning rate decreases, the convergence of losses becomes smoother. Nevertheless, a small learning rate leads to a low speed of convergence and then more time is needed to obtain the optimal solution. With 100 epochs, the learning rate of 1×10 −4 shows the least loss; accordingly, we explored six values around 1 × 10 −4 in the fine-tuning stage: 1 × 10 −3 , 8 × 10 −4 , 6 × 10 −4 , 4 × 10 −4 , 2 × 10 −4 , 1 × 10 −4 , 6 × 10 −5 . As shown in Figure 12b, the learning rate of 6 × 10 −4 results in the least loss for the proposed model.

Discussion
This study reports a deep-learning-based reduced-order model for prediction of drug trajectory and concentration field. After further investigations, it is expected to work as an auxiliary tool to assist physicians to direct the drug to the tumor site in clinical chemoembolization therapy treatment. Although the traditional numerical simulation can also give an accurate calculation with the known PDEs, the developed network model can avoid the numerical iterations and, thus, allows for extremely fast predictions of the physical fields, which is essential for a real-time treatment auxiliary tool.
Furthermore, we propose a purely data-driven modeling method and a CNN architecture for the reduced-order model. Although physics-informed machine learning (PINN) is currently the leading direction in physics research involving machine learning,

Discussion
This study reports a deep-learning-based reduced-order model for prediction of drug trajectory and concentration field. After further investigations, it is expected to work as an auxiliary tool to assist physicians to direct the drug to the tumor site in clinical chemoembolization therapy treatment. Although the traditional numerical simulation can also give an accurate calculation with the known PDEs, the developed network model can avoid the numerical iterations and, thus, allows for extremely fast predictions of the physical fields, which is essential for a real-time treatment auxiliary tool.
Furthermore, we propose a purely data-driven modeling method and a CNN architecture for the reduced-order model. Although physics-informed machine learning (PINN) is currently the leading direction in physics research involving machine learning, the proposed purely data-driven model requires far fewer computational resources and time for training to make the model converge. In addition, benefiting from the CNN structure, the model can discretize the flow field as pixel-level figures/matrix; therefore, the CNN-based model can be adapted to any geometry easily, much easier than fully connected network-based PINN. CNN architecture enables the model to accurately predict the drug concentration field with relatively small databases, which is important in the biomedical/clinical field, because, sometimes, the data is limited due to privacy concerns or rare diseases.
The results show a high accuracy of 99.9% in the most complex geometry, the representative hepatic system, which verifies the great performance of the proposed ROM. The high accuracy is also related to the evaluation method of average accuracy. In this study, the drug injection is concentrated from a very tiny point and the amount is small. Therefore, the drug has a small distribution range. However, we considered the entire flow field in the artery when calculating the accuracy (because it is not easy to define which area should be counted), so the accuracy reaches 99.9%.
It is acknowledged that this preliminary study has some limitations. We only focused on the injection location without consideration of other potentially influential factors such as injection direction and injection time windows. In addition, only one representative geometry of the human hepatic artery system was studied. However, to assist clinical therapy, patient-specific anatomies should be considered. In other words, a geometry-adaptive model [18] should be explored. Additionally, our study is limited to two dimensions, which is not the case for real drug delivery and mass transfer problems in blood vessels. Hence, rather than providing a predictive model that can be directly used in the clinic, the main contribution of this study should be the proposition of the new idea and method.
To the best knowledge of the authors, this is the first work on real-time prediction of drug trajectory and concentration field after injections. However, for the purpose of assisting image-guided clinical therapy, further investigations should be explored, such as predictions under three-dimensional CT or MRI-based patient-specific anatomies and construction of an ROM with the geometry and injection condition adaptive ability.

Conclusions
In the current study, we report the performance of a data-driven ROM based on a deep convolutional neural network for real-time prediction of the spatial-temporal drug trajectory and concentration field in blood vessels. The model is trained and validated using the drug delivery datasets at different injection locations and the performance of the neural network model is further tested with cases that have never been seen by the network model. The average prediction accuracy in straight and bifurcated blood vessels reaches 97.1% and 99.4%, respectively. Applying the two convolutional paths to encode the features of vessel geometry and injection location, for the most complex problem, the hepatic artery system, the accuracy reaches 99.9%.
Although the proposed model requires about 35 min to train, the prediction time for each concentration field is less than 0.07 s, which is more than 4 orders of magnitude faster than that of numerical (CFD) simulations. The current model is hoped to provide a useful clinical tool to facilitate transarterial drug injection treatment of liver cancer patients, in real time, while providing accurate prediction of the drug trajectory and concentration field. This tool should assist physicians to direct the drugs to specific tumor-bearing segments, thus, improving the efficacy of treatment. In addition, the proposed method shows great potential for use in clinical interventional treatment of other diseases.