1. Introduction
Particulate matter emission into the atmospheric air is a global health challenge, and emissions are increasing every day [
1]. The increasing rate of respiratory health patients is evidence of the health hazards of atmospheric pollution [
1]. Precise knowledge of atmospheric aerosol transport and deposition to the airways of the human lung is essential for health assessment and drug delivery purposes [
2]. To date, researchers have employed wide-ranging approaches (in silico, in vivo and in vitro) to understand aerosol transport to the airways. The computational fluid dynamics (CFD) approach is the most popular method for airflow and particle transport in airways [
3,
4] and most of the studies to date used CFD for aerosol transport in airways. Almost all of the published literature considers the upper airways and analyses aerosol transport in airways [
5,
6,
7]. Cheng et al. [
8] performed an in vivo study for ultrafine aerosol transport and deposition in human oral and nasal airways. The study employed 10 adult healthy males cast for the analysis. The in vivo study reports the smaller cross-section area of the extrathoracic airways and that the complex airway shape influences the overall ultrafine particle deposition pattern. Later, Zhang et al. [
6] studied ultrafine and microparticles transport in an idealised mouth–throat and upper airways. The symmetric three-generation (G0–G3) model investigates the aerosol deposition pattern for different flow rates. The study reports a higher deposition concentration of the aerosol at the carinal angle region of the airways than in the bifurcating airways. In 2008, Zhang et al. [
9] considered a 16-generation airway model and analysed the nanoparticle’s transport and deposition. The study did not consider the asymmetric whole lung model for the 16 generations, and several 3-generation models were used to analyse the particle deposition.
To date, the best available and large-scale asymmetric airway model with maximum bifurcating branches has been developed by Schmidt et al. [
10]. The authors developed a large-scale and highly complex airway model for the first 17 generations of human airways. Gemci et al. [
5] used the model of Schmidt et al. [
10] and analysed the airflow for the large-scale model for the first time. The digital reference model had 1453 bronchi up to the 17th Horsfield order. This study analysed the airflow for the first time for a large-scale model and improved the knowledge of the field. Later, in 2017, Islam et al. [
7] used the same 17-generation airway model and comprehensively analysed the microparticle transport and deposition in a 17-generation model. This first-ever study considered different physical activity conditions (resting, light activity and heavy physical activity) and analysed the deposition pattern for the first time. However, the CFD simulation of a 3-dimensional anatomically realistic 17-generation lung model based on high-resolution computer tomography (HRCT) data requires huge time and computational resources. The 17-generation model used by Islam et al. [
7] took approximately 55 days for a single simulation in a high-performance computing system. This study did not consider all branches of the 17-generation model and would significantly increase the computational cost. Islam et al. [
11] also studied ultrafine particle transport and deposition to the lower airways of the 17-generation model, and the first-ever study comprehensively analysed the lobe-specific deposition pattern. The study calculated the deposition hot spot for the ultrafine particles, and the computational cost for ultrafine particle transport for the 17-generation model was significantly high. The computational cost for the CFD model is extremely high due to the huge number of computational cells, and it is nearly impossible to simulate a whole lung model (23 generations) with all bifurcating branches. Therefore, coalescing machine learning algorithms (MLAs) with CFD simulation has attracted wide attention because it can accelerate the prediction and save time and computational cost [
12].
Recently, many researchers have used MLAs in combination with CFD to save time and cost required in the CFD simulation. Ref. [
13] employed the random forest (RF) technique to predict the viscosity of nanofluids. Kwon et al. [
14] discovered that the RF model agreed with the CFD results in estimating convective heat transfer in a channel. Jamei et al. [
15] used different MLAs to estimate the specific heat capacity of nanofluids. Although researchers have used MLAs for different engineering applications using CFD data, no ML model has been developed for aerosol transport and deposition to the lower airways.
Therefore, the objective of this study is to develop an innovative ML prediction model for pharmaceutical aerosol transport to the lower airways of a 17-generation model. The main goal of this study is to develop ML regression models based on the CFD dataset generated in our previous work. Next, the different ML models are compared using statistical parameters, and the best ML model is selected. For this purpose, four different MLAs are adopted to significantly decrease the computational cost and find the impact of each parameter on estimated results. Then using the best MLA model, particle transport and deposition (TD) in the transitional and respiratory zones (deeper airways) of the human lung for different particle diameter and flow rates are predicted and analysed.
2. Problem Definition and Numerical Methods
The present study extends our previous work in which we considered a highly-asymmetric 17-generation airway model (
Figure 1) for aerosol transport prediction. The airway model is constructed from the airway dimensions of Schmidt et al. [
10] and the final model consists of 1453 bronchioles. The large-scale 17-generation model is one of the best available airway models in the literature. The airflow and particle transport in a whole lung model is highly computationally expensive as it needs to solve millions of computational cells. The 17-generation model consists of 7 million cells, and the multiphase flow for a single case takes approximately 55 days in high-performance computing systems. To get a precise understanding of the various shape and size-specific parameters of aerosol transport to the lower airways of the large-scale model, it needs to run a large number of simulations, and it needs a significant amount of time. Our previous study analysed particle transport numerically and developed a database for aerosol transport for various flow rates and diameter particles [
11]. In the present work, the database is used to develop an ML model which can accurately predict aerosol transport to the lower airways for various aerosol sizes and flow rates.
The numerical study solved mass and momentum equations for the primary and dispersed phase [
7]. A proper grid independence and validation with the benchmark experimental, numerical and theoretical data was conducted by a numerical model. The detailed numerical methods, grid refinement and validation of the numerical data were available in the author’s previous studies [
7,
11]. That study developed the database for the deposition efficiency at the right lung and the left lung of the airway model.
Table 1 shows the deposition data at the right lung and the left lung for different flow rates.
Table 2 shows the deposition data at the right (Upper, Middle and Lower) and left (Upper and Left) lungs for different flow rates.
3. Methodology
The procedure adopted in the present work is presented in the flowchart, as shown in
Figure 2. Firstly, we collected the CFD data from our previous work for the ML step. The goal of using MLAs with CFD simulations was to decrease the computational time and, thus, the computational cost. The four MLAs used in the present analysis are
k-nearest neighbours (
k-NN) [
16], Gaussian process regression (GPR) [
17], random forest (RF) [
18] and multi-layer perceptron (MLP). The best MLA among all was then used for optimisation. MLAs application requires hyper-parameters tuning to obtain the optimum result. The GridsearchCV tool was used for obtaining the optimised values of the important parameters. It is available in the Scikit-learn library [
19].
Machine Learning Regression Models
The CFD problems involving multi-variables can be easily solved by using MLAs. Further, the number of variables can be easily increased in MLAs (Jalalifar et al., 2020) as they are proficient in handling many input and output quantities. The present study has three independent variables (particle size, flow rate and lung position) and one dependent variable (deposition efficiency). The functional form of the model is presented below:
Two categories of models are developed in the present study. In category-I models, we selected left and right lungs only, while in Category-II, we selected left lower, left upper, right lower, right middle and right upper lungs to achieve a more detailed analysis.
The four ML techniques selected in this work have been extensively used for regression problems [
20,
21].
k-nearest neighbour (
k-NN) [
22]:
k-NN MLA is robust, computationally inexpensive and easy to implement. The outcome of a test point is computed based on the interpolated value of the test point’s adjacent neighbours. The optimum number of neighbours,
n, is indicated by the user and it is defined by the
n training points with minimum Euclidean distance in the input feature space.
k-NN MLA can estimate an outcome by performing data points weighted average with analogous input features.
Random forest (RF) [
23]: RF is an ensemble model that increases estimating ability by combining models. The method it works on is called bagging. In this method several trees are created from input data and then data are reinstated and extracted from all data. The final result is the average of the results of all trees. The RF method is widely used because of its prediction capability.
Gaussian process regression (GPR) [
24]: Gaussian processes can represent random variables in ML. It may be an unidentified error in a linear regression model or anything else. The data can be represented similar to a normal distribution curve in a GPR distribution. Gaussian distribution, similar to a normal distribution, has two key parameters, i.e., mean (
μ) and variance (
σ2) (“Gaussian Distribution for Machine Learning and Data Science (Normal Distribution) | by Hemanth Nhs | Medium”). The following equation gives the Gaussian distribution for a dataset:
where
μ is the average value and
σ is the skewness of any dataset.
Multi-layer perceptron (MLP) [
25]: MLP can offer improved output by learning the non-linear data pattern through network modelling. MLP has a minimum three layers: an input layer, a hidden layer and an output layer. Each node contains a neuron along with non-linear activation function, excluding the input nodes. MLP uses a Levenberg–Marquardt algorithm (back propagation) for training. MLP comprises of hyperbolic tangent sigmoid transfer function for hidden layers and a linear function at outer layer.
5. Conclusions
The present work advances the deposition analysis in the upper 17 generations of the human respiratory tract. Previously, researchers used CFD analysis, while a novel ML-based model is developed in the present work for the first-time. The present study developed ML prediction models from the CFD data of a 17-generation large-scale model. Different ML regression models are developed, and statistical analysis is performed to determine the best ML model for predicting pulmonary aerosol transport and deposition analysis in the upper 17 generations of the human respiratory tract. The ML models are trained and tested with the CFD data. It is witnessed that the MLP model performed well, with an overall GPI of 1.603 compared to other regression models. Furthermore, out of eight statistical indicator values, the MLP model has significant values for six. Therefore, the MLP regression model is used to predict the DE for a wide range of flow rates and particle sizes.
The MLP model predicts the DE in both lungs. The prediction shows higher DE in the right lung than in the left lung. The MLP model also reports higher DE at the left and the right lung irrespective of the inlet conditions and particle diameter. It is also observed that the MLP model gives excellent predictions with a similar trend of deposition efficiency as observed in our CFD work.
A comprehensive lobe-specific DE is calculated by using the ML prediction model. The overall DE at the right lower lobe is higher than the remaining lobes. The DE of various diameter particles is different at the lobes for different flow rates. The MLP prediction model analysed the deposition hot spot at various lobes for the first time, which would improve the knowledge of the aerosol deposition in the lower airways. These specific findings would improve the knowledge of the field and could potentially improve the efficiency of the targeted drug delivery to the lower airways. Thus, the MLP model can be used to predict deposition efficiency for flow rates and particle sizes not considered in CFD analysis, resulting in considerable savings in time and cost. The developed prediction model will save a significant amount of computational time. The present study, along with a physics-informed ML modelling for the airflow and particle transport in airways, would improve the knowledge of the field.