Rational Design of Field-Effect Sensors Using Partial Differential Equations, Bayesian Inversion, and Artificial Neural Networks

Silicon nanowire field-effect transistors are promising devices used to detect minute amounts of different biological species. We introduce the theoretical and computational aspects of forward and backward modeling of biosensitive sensors. Firstly, we introduce a forward system of partial differential equations to model the electrical behavior, and secondly, a backward Bayesian Markov-chain Monte-Carlo method is used to identify the unknown parameters such as the concentration of target molecules. Furthermore, we introduce a machine learning algorithm according to multilayer feed-forward neural networks. The trained model makes it possible to predict the sensor behavior based on the given parameters.


Introduction
Silicon nanowire (SiNW) field-effect transistors (FETs) are typically used to detect proteins [1], cancer cells [2], DNA and miRNA strands [3,4], enzymes [5], and toxic gases such as carbon monoxide [6,7]. The sensors have several advantages including fast response, very high sensitivity, and low power consumption; they do not need labeling and can be used to detect subpicomolar concentrations of biological species [8][9][10][11][12][13]. The functioning of the sensors is based on the field effect due to the (partial) charges of the target molecules. When they are selectively bound to probe molecules and close enough to the semiconducting transducer, they affect the charge concentration inside the nanowire, which changes the current through the nanowire.
Using mathematical models based on partial differential equations (PDEs) enables us to model physically relevant quantities such as electrostatic potential, electron and hole current density, device sensitivity to the target molecule and signal-to-noise ratio [14][15][16][17][18]. The three-dimensional simulations give rise to more reliable models compared to twodimensional cross-sections, since all target molecules bound to bio-receptors will be included [19,20]. We couple a charge transport model (the drift-diffusion equations) and the nonlinear Poisson-Boltzmann equation (PBE) for fully self-consistent simulations. The system of equations is a comprehensive model to compute the electrical current and study the nonlinear effects of different semiconductor parameters (e.g., doping concentration) and device parameters such as nanowire type (radial, trapezoidal, radial, or rectangular),

The Model Equations
The drift-diffusion-Poisson system is used to describe the electrochemical interactions (Poisson-Boltzmann equation) and the charge transport (drift-diffusion equations) in fieldeffect sensors. The convex and bounded domain Ω ⊂ R 3 consists of four subdomains, namely the insulator (SiO 2 , Ω Si ), the silicon substrate and transducer (Ω Si ), the aqueous solution (Ω liq ), and the charged molecules (Ω mol ). To model the potential interactions, we use the Poisson-Boltzmann equation in Ω Si , 0 in Ω ox , where A indicates the dielectric constant, which is a function of the material, V is the electrostatic potential, C dop is the doping concentration, ρ is the surface charge of the molecules, φ F denotes the Fermi level, and ϕ is the ionic concentration. Regarding the electrical constants, we use the relative values A Si = 11.7, A ox = 3.9, A M = 3.7, and A liq = 78.4. Considering the Boltzmann constant k B , the temperature T and the elementary charge q, we define β = q/(k B T). In the simulations, a thermal voltage of 0.021 V will be used.
A two-dimensional cross-section of the device is given in Figure 1. At the interface between the insulator and the liquid (i.e., Γ := Ω ox ∩ Ω liq ), we impose the interface conditions for V I . Here, 0+ and 0− denote the limit at the interface on the side of liquid and insulator. Furthermore, α is macroscopic dipole moment density, and γ is the macroscopic surfacecharge density.
In Ω Si , we solve the drift-diffusion system ∇ · J n = qR(n, p), (3b) to model the charges in the transistor, where D n and D p are the electron and hole diffusion coefficients. The concentrations of electrons and holes are given by where n i is the intrinsic carrier density and Φ 1 and Φ 1 are the Fermi levels. In order to compute the electron and hole current densities, we use the Shockley-Read-Hall recombination rate, i.e., where τ n and τ p denote the lifetimes of the electrons and holes. For solving the nonlinear system of equations, we use the Scharfetter-Gummel iteration. For this, we write the concentrations n and p in terms of the two Slotboom variables u and v as Therefore, the model problem (3) can be rewritten as where U T is the thermal voltage and the Shockley-Read-Hall recombination rate takes the form At the ohmic contacts (backgate, source, and drain) and the solution gate, we have a Dirichlet boundary condition V ∂Ω = V D consisting of At the source and drain contacts (on ∂Ω Si ), we apply For the remaining part of the domain, we impose a zero Neumann boundary condition to guarantee the self-isolation. We refer the interested reader to [15,19,33,34] for theoretical discussions about the model including the Slotboom variables. The existence and uniqueness of the solutions for deterministic and stochastic model problems are given in [15,35]. Finally, the computation of J n and J p enables us to calculate the electrical current as where we take the integral on a cross-section of the transducing part.
In this work, we use the finite element method (FEM) to solve the coupled system of equations. We define the spaces Therefore, we define the continuous solution space X := X 1 × X 2 × X 3 for the DDP system. Regarding the space discretization, we assume T h = {T 1 , T 2 , . . . , T n } denotes a quasiuniform mesh defined on Ω h ≈ Ω with mesh width h := max T j ∈T h diam(T j ). We define where P 1 is the space of first-order polynomials. Then, we have The discrete solution is defined as The weak form of the model equations can be found in [15,33]. The a prior and a posterior estimations are proved in [33]. More theoretical works regarding the finite elements analysis are given in [36][37][38].

Parameter Estimation Based on Bayesian Inference
In different experimental situations, an accurate estimation of the effective parameters and constants cannot be easily estimated. Bayesian inversion techniques based on Markov chain Monte Carlo methods are efficient and straightforward probabilistic techniques to estimate these unknowns. We initiate the algorithm using the available information, named prior knowledge (which may not be sufficiently accurate), and during several iterations, we can update the information and provide more reliable data (i.e., the posterior density). Then, we can extract valuable information from the posterior density, and its mean/median can be used as the solution of the interference. A very strong agreement with the experimental values and the model response can be achieved. We start a statistical model where M is the experimental observation (normally n− dimensional), while P is the solution of the model problem which depends on the set of parameters χ (i.e., χ = (χ 1 , χ 2 , . . . , χ k ) and the Cartesian coordinates x. Here, ε is the measurement error, and we assume that it is normally distributed, i.e., ε ∼ N (0, σ 2 I), including the parameter σ 2 .
Having an experimental observation, for instance electrical current (i.e., M = obs), we define the probability function Our aim is to estimate the posterior density π(χ|m), considering the measured observation m and the available prior information. For this, we compute the likelihood function is the sum of square errors. Obviously, if the model response with respect to the (set of) parameters χ will be closer to the measured value, the square error (15) will converge to zero, and its relative probability (computed by the likelihood function) will converge to 1. Inaccurate estimation of χ will increase the error term, and the probability will converge to zero.
In the Metropolis algorithm, we initiate the process using an initial guess χ 0 based on the prior density. According to the proposal distribution, a new candidate χ is proposed. We compute the acceptance rate by If the new candidate χ is accepted, we continue the MCMC chain with that; otherwise, (χ j−1 has a higher probability concerning χ ), we follow the chain with the previous candidate. Using a non-symmetric proposal density is a generalization of the Metropolis algorithm, introduced by Hastings [21], where the probability of the forward jump is not equal to the backward one. A summary of the algorithm is given in Algorithm 1.

Algorithm 1 The Metropolis-Hastings algorithm.
Initialization: Start the process with the initial guess χ 0 and number of samples N. while j < N 1. Propose a new sample according to the proposal density χ * ∼ T (χ * | χ j−1 ). 2. Compute the acceptance/rejection ratio The Metropolis-Hastings algorithm is a simple and versatile technique and has been widely used for several problems in applied science. However, for the high-dimensional cases (different parameters should be inferred simultaneously), the algorithm does not work appropriately, since the rejection rate increases significantly. To improve its computational drawbacks, different improvements, such as the adaptive Metropolis algorithm [22], delayed rejection Metropolis [23], and their combination, namely delayed rejection adaptive Metropolis (DRAM) [24]. We refer the interested readers to [26] as a review paper about the methods.

Mcmc with Ensemble-Kalman Filter (EnKF-MCMC)
In EnKF-MCM [25], we use a Kalman gain employing the mean and the covariance of the prior distribution and the cross-covariance between parameters and observations. It will be used to compute the proposal distribution and make the convergence to the target density faster. Here, the new candidate is computed as the jump of the Kalman-inspired proposal ∆χ as In order to update the candidates, we compute ∆χ by where K denotes the so-called Kalman gain, Here, C θ M indicates the covariance matrix between the identified unknowns and model response, C MM points out the covariance matrix of the model response, and R denotes the measurement noise covariance matrix [39]. In addition, y j−1 is the residual of the proposed values concerning the model and s j−1 ∼ N (0, R) relates to the density of measurement. A summary of the relative algorithm is given in Algorithm 2. Finally, Figure 2 shows the implementation of EnKF-MCMC and Schafetter-Gummel iteration for parameter estimation and solving the model equations.

Multilayer Feed-Forward Neural Networks
Neural networks are efficient, flexible, and robust simulation tools specifically for nonlinear and complicated problems. They consist of three effective components, including neurons, structures, and weights, which all affect the response and behavior of the network. Artificial neural networks (ANNs) are supervised machine learning algorithms consisting of neurons and hidden layers. The input data are processed into the hidden layers, the output is compared with the target trajectory, and the relative error is computed. The neural networks strive to minimize this error.
Typically, there are two common classes of neural networks, namely feed-forward neural networks (single or multilayers) and recurrent dynamics neural networks. Singlelayer neural networks [40] have less complexity; however, they are more suitable for linear problems. In multilayer feed-forward neural networks (MFNNs) [41,42], more than one layer of the artificial neurons will be used to enhance the capability to learn nonlinear patterns, which is more appropriate for BIO-FETs. In MFNNs, the neurons are organized in different non-recurrent layers, where in the first layer, we have the input vector (here are the parameters of the sensor), and the output is given to the first hidden layer. After the data processing, the data are transferred to the next layers using the weights; the procedure is followed until the latest MFNNs layer. These networks are also named multilayer perceptrons, and their structure is shown in Figure 3.
where w is the weights, η is the training rate, E is the network mean square error (MSE), δ is the sensitivity function (here, δ s indicates the network error in the jth layer), net s is the weighted input, n s is the number of neurons in the sth layer, x 0 is the network input, x s−1 is the output of the s − 1th layer, and it is also the input of the sth layer. We also have the following initial conditions for the recurrent process Figure 4 shows the jth neuron in the ith layer in the learning algorithm. In the recurrent process, in order to adjust the weights from the first layer, we follow as So, we can write δ s where The gradient of E (the difference between desired trajectory and the neural networks's output) with respect to the weight vector is given by where the second term depends only on the neurons features and takes Using the back-propagation error algorithm enables us to adjust the weight functions in order to minimize the network error. This training process is also named the supervised learning algorithm.

Numerical Experiments
As we already mentioned, the DDP system is a roust and reliable system of equations to model the electrical behavior of the FET devices. We use a prostate-specific antigen (PSA) sensitive sensor which is used to diagnose prostate cancer. For the simulations, we use a sensor device with the nanowire length of 1000 nm, width of 100 nm and height of 50 nm, which is coated with SiO 2 with 8 nm thickness. We use the P 1 finite element to solve the model problem, and tetrahedral meshes are employed to discretize the domain. A schematic of the bio-FET including dimensions using 6622 nodes and 45,735 tetrahedra is shown in Figure 5. The sensor is developed for the detection of 2ZCH (https://www. rcsb.org/structure/2ZCH). The PROPKA algorithm predicts the pK a values of ionizable groups in proteins and protein-ligand complexes based on the 3D structure. The values are the basis for understanding the pH-dependent characteristics of proteins and catalytic mechanisms of many enzymes [43]. To compute the net charge, we performed a PROPKA algorithm [44][45][46] to detect the net charge for different pH values. The simulations are completed using a pH value of 9, giving rise to the net charge of −15 q [14]. In field-effect sensors, surface reactions at the oxide surface depending on the pH value and the binding of charged target molecules result in changes in the charge concentration at and near the surface, and subsequently in changes in the electrostatic potential, which then modulates the current through the transducer. Since the molecules are negatively charged, the binding of the target molecules to the bio-receptors will enhance the charge conductance and increase the response of the sensor (i.e., the electrical current).
The system of equations is capable of modeling the surface charges at the surface. In a previous work, we developed a Monte-Carlo approach to simulate the charges around a charged biomolecule at a charged surface [47]. Furthermore, in [48], a nonlinear Poisson model was used to calculate the free energies of various molecule orientations in dependence of the surface charge. Based on the free energies, the probabilities of the orientations were calculated, and hence, the biological noise was simulated.

Model Verification
As the first step, we verify the model accuracy with the experiments. We compute the electrical current I with respect to different gate voltages V G where the source-to-drain voltage V SD = 0.2 V, doping concentration C dop = 1 × 10 16 cm −3 , and the thermal voltage U T = 0.021 V. The experimental data are taken form [20]. In order to solve the nonlinear coupled system of equations, a Scharfetter-Gummel-type iteration is used. Figure 6 shows the current as a function gate voltage varying between V G = −1 V and V G = −3.5 V for experimental and simulation values. These results indicate that the DDP system is reliable and will be used for the next simulations.

Bayesian Inversion
The molecules are negatively charged (here, −15 q is used); however, an accurate estimation of the molecule charge density will be necessary. In semiconductor devices, in order to enhance the conductivity, impurity atoms are added to the silicon lattice, namely the doping process. Higher doping concentration will improve the transistor conductivity; however, the device will be less sensitive to the charged molecules. Physically, doping concentration (as a macroscopic quantity) denotes the average amount of the dopants. We implemented a delayed rejection adaptive Metropolis (DRAM) [14] and the Metropolis-Hastings algorithm [1] to infer doping concentration, molecule charge density, and probetarget density. The efficiency of the EnKF-MCMC compared to these algorithms is studied in [26]. Therefore, we employ the Kalman filter for the proposal adaptation. We performed the MCMC algorithm with N = 10,000 iterations, and a uniform prior density is used. The computational aspects are summarized in Table 1.
The back-propagation error is an efficient algorithm for the training of neural networks where we compute the gradient of the loss function with respect to the weights of the network.
Employing a footprint of 10 nm for the molecules [20,49] gives rise to a surface charge of −1.5 q/nm 2 . In the experiments, a doping concentration of 1× 10 16 is used in the transducer (both values are selected as the true values). The posterior densities are shown in Figure 7. As expected, the posterior densities are around the true values. Regarding the surface charge, we have a normal distribution, and the charge cannot be positive (which is reasonable due to using P-type FET). For the doping concentration, the distribution points out that for C dop more than 2 × 10 16 , the sensitivity will reduce significantly, and almost all of the candidates are rejected.

Machine Learning Based on MFNNs
In this section, we employ MFNNs to train the machine according to available information from the sensors. The effective physical/geometrical parameters will have a nonlinear effect on the device output. For instance, for a doping concentration of more than C dop = 2 × 10 16 , the current will increase sharply, which is compatible with the results in Bayesian inversion (Figure 7). Due to this nonlinear behavior, the MFNNs algorithm is chosen to monitor the data accuracy and reliability and predict the sensor behavior.
More hidden layers will facilitate the convergence to the desired trajectory; however, it will increase dramatically the computational costs (e.g., computational time). In this work, we use two hidden layers for the MFNNs algorithm to strike a balance between complexity and efficiency. The procedure is shown in Figure 8. We define five specific scenarios according to the number of inputs. In Case 1, we only have one input (V g ) varying between −1 V and −5 V, where other parameters including insulator thickness, nanowire width (N W ), doping concentration, and nanowire height (N H ) are constant. In Case 5, we have five inputs, and the output is the calculated electrical current. Table 2 shows the range of the parameters used for different cases.  The MFNNs algorithm is trained with two learning rates (i.e., η = 0.1 and η = 0.2) and different numbers of epochs. Here, we use 75% of the samples for data training and 25% of the samples for data testing. The numbers of epochs and neurons in the 1st and 2nd hidden layers are given in Table 3. The sigmoid function is used as an activation function in hidden and output layers. In order to verify the efficiency/accuracy of the MFNNs structure algorithm, for different cases, we compare the output of the machine learning algorithm with the desired trajectories (computed currents). We have the relative MSE for the test and training process and performed a linear regression test to explain the relation between the targets and MFNNs output. Figures 9 and 10 show the results for Cases 1-5, where in all cases, there is a good agreement between the machine learning output and the sensor data.  In the second column, we have the relative MSE, and the regression test is given in the third column. Figure 10. The performance of the MFNNs algorithm for Case 4 (a) and Case 5 (b). In the first column, the desired trajectories (shown in blue) are compared with the MFNN output (shown in red). In the second column, we have the relative MSE, and the regression test is given in the third column.

Conclusions
In this work, we introduced a computational framework for modeling charge transport and electrostatic potential distribution in SiNW-FETs in order to enable the rational design of this sensor technology. The PDE-based model has been verified with the experimental data and showed its accuracy. Bayesian inversion can be used to determine quantities of interest such as molecule concentrations, surface charges, and doping concentrations.
Our approach and results can be extended to different types of sensors including plasma resonance-based biosensors, fluorescence-based sensors, and electrochemiluminescence-based biosensors that are used to detect biomarkers.
Finally, machine learning algorithms based on MFNNs have been developed for SiNW-FETs. Here, we use two hidden layers to deal with the nonlinear behavior of the current (with respect to the input parameters), where the method shows its computational efficiency. We used 75% of the data to train the machine and the remaining 25% for testing. In both cases, the obtained MSE shows the convergence to the desired trajectory. The results indicate that MFNNs are a suitable machine learning algorithm for SiNW-FETs and can be used to predict the sensor output behavior as a compact model.