1. Introduction
Semiconductor nanostructured materials, in particular metal oxides, attract great attention due to their numerous applications. These applications use conductive, sensor, photoelectric, catalytic, magnetic, plasmonic and dielectric properties of nanomaterials (see, for example, [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11]). Charge distribution over the volume of a nanoparticle plays a key and decisive role in determining such properties; not only is electron distribution between volume and surface important, but so is their distribution over the radius of the nanoparticle. There exists vast literature devoted to the description of electron distribution in semiconductor nanoparticles. Essentially, all the reported approaches are based on the ion-sorption model which is described in detail in the monograph [
12]. This model takes into account the electron transfer processes between the adsorbate and the adsorbent. Such transfers affect the adsorbent’s connection with the surface of the semiconductor. Molecule chemisorption at the surface leads to emergence of localized energy levels in the prohibited zone. Depending on their position with respect to the Fermi level, both electrons from the conduction band and holes from the valence band may be captured. Our previous research mostly considered donor semiconductors and assumed the existence of the two kinds of nanoparticles, with low and high concentrations of conducting electrons, respectively. In the first case (i.e., low concentration), a negatively charged layer on the surface of the nanoparticle does not exist, and electron distribution through their volume is uniform (see, for example, the paper [
13]). In the opposite case of high electron concentration, a negatively charged surface layer emerges, and charge distribution inside the nanoparticles become drastically non-uniform [
14]. Later research improved the accuracy of the ion-sorption model for both the one-component [
15] and the two-component [
16] systems, while also presenting electron distribution over nanoparticle radius in the In
2O
3 and CeO
2-In
2O
3 systems. The latter works on the charge distribution when compared to the obtained results with experimental data on the conductivity and the sensor effect. Following the ion-sorption model, when applied to the sensor process, and taking into account the numerous experimental data on the resistance of metal oxide films in the atmospheres of various gases, Electronic Parametric Resonance (EPR) studies of radical forms of adsorbates, as well as the data on temperature programmable gas adsorption/desorption [
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28], the following sequence of reactions on the nanoparticle surface may be convincingly devised:
Dissociative oxygen molecule adsorption on the surface of the metal oxide, accompanied by electron capture from the surface layer of the nanoparticle.
Reaction of molecules of reducing gas with negatively charged adsorbed atom of oxygen. It is taken into account here that in the range of working temperatures of the conductometric sensor (200–400 °C) O
- is the basic ion on the oxide surface [
19].
The neutral product of the reaction is lost to the atmosphere, while the liberated electron is transferred to the nanoparticle volume. This process leads to an increase in conductivity of the sensitive layer.
Comparison between the theory and experimental data [
15,
16] demonstrated good agreement for realistic parameters of the system, thus confirming the correctness of the proposed scheme of the sensor process. However, the later works have a significant deficiency: the theory-based fitting of experimental data is conducted manually and takes quite a significant amount of time. Consequently, it appears that the quality and accuracy of obtained results, as well as the amount of time spent, are essentially subjective, i.e., conditional on the researcher’s skills. The only possibility of overcoming this deficiency is by the automation of the inverse problem (search of parameters) using modern optimization techniques. The present study applies to this task, for the first time, the new global optimization method PROTES (Probabilistic Optimization with Tensor Sampling) [
29], based on the machine learning paradigm and low-rank tensor train (TT) decomposition [
30]. This application warrants the mathematical novelty of the presented study.
PROTES is particularly well-suited for high-dimensional optimization problems like the one in the present study, where the parameter space is five-dimensional and the objective function exhibits plateaus and possible noise. In practice, PROTES demonstrates several advantages over standard methods such as Bayesian optimization (BO) or evolutionary algorithms (EAs): (1) PROTES leverages low-rank tensor train (TT) decompositions to represent the search distribution compactly. This allows efficient sampling and updates even in multidimensional spaces, whereas BO and EAs often suffer from the “curse of dimensionality” and require an exponentially growing number of function evaluations. (2) Unlike EAs, which rely on population-based heuristics, PROTES uses a probabilistic model that is adaptively updated via gradient-ascent steps on the log-likelihood of top-performing samples. This enables faster convergence and a better exploration–exploitation trade-off. (3) PROTES samples multiple points per iteration and uses a top-k selection strategy, which inherently smooths out the effects of noise in the objective function. This makes it more robust compared to methods that rely on precise gradient information or small sample sets. The studies [
29,
31] provide an extensive comparison of the PROTES optimizer and a closely related TTOpt optimizer (based on the same tensor train decomposition) with various alternative approaches including gradient-based methods, BO, genetic algorithms, and EAs applied to a wide class of model problems. Given these advantages and the strong empirical results reported in the literature, PROTES is a highly appropriate choice for the inverse problem considered in the present study.
In the last few years, the TT-decomposition has become a powerful widespread tool for handling high-dimensional data and computations in the fields of machine learning and artificial intelligence. The essence of the method is described as follows: Suppose fitting is sought for the function F of d parameters. First of all, discretization of the problem with respect to each parameter introduces the N-node scale, thus generating a multidimensional table containing p = Nd values. For example, for the specific problem considered in the present study, d = 5 and N = 100, and obviously, direct enumeration of all these values is not realistic.
The choice of N = 100 nodes per parameter is justified by the following rationale. A coarser grid (e.g., N = 50) was found to sometimes miss the finer details of the objective function landscape, leading to suboptimal solutions. A much finer grid (e.g., N = 250) would exponentially increase the size of the search space without a corresponding increase in the physical accuracy of the model, thus wasting computational resources. The choice of N = 100 represents a practical balance between numerical precision and the computational cost of the optimization, allowing PROTES to efficiently converge within our budget of m = 1000 function evaluations. Therefore, with a high level of confidence, the selected discretization level is sufficiently fine to capture the core physics of the problem without introducing computational waste or the risk of overfitting.
Therefore, at the next step, an iterative process is arranged in such a way that
K samples of
p values are generated randomly, and corresponding values of the function
F are calculated for these samples. The learning stage is as follows: A certain smooth distribution function, having the special form of TT-decomposition (tensor train [
30]), is generated based on the separate calculated values. TT tensor representations allow various operations of computational mathematics, such as gradient calculation, to be performed quickly and efficiently.
Aiming at increasing the probability of finding minimum value, PROTES makes several (kgd) steps along the gradient in order to correct the distribution function. Iterations are being further repeated; however, new samples are generated based on the corrected distribution function. The iteration process stops once the total number of calculations of the objective function (m) reaches the preset number. It is expected that at a large m the distribution function closely approximates delta-function with a pronounced peak at the minimum of the objective function.
2. Mathematical Problem Formulation
From the mathematical point of view, the required procedure reduces to the solution of the conjugate problem consisting of the set of kinetic equations, describing processes occurring at the surface, and the boundary value problem for the integro–differential equation which models distribution of charge inside the nanoparticle, as well as on its surface. In line with this strategy, the objective function F(p1, p2, …, pd), which reflects the deviation between calculated and experimental temperature dependencies of the sensor effect, is being built. Minimization of this function delivers unknown physical parameters of the system. Automation of the above procedure is carried out by the dedicated program which has been developed based on the global optimization method PROTES (version 0.3.11). The use of PROTES cuts down parameter adjustment time while tremendously increasing the accuracy of the solution of the inverse problem. This opens new possibilities for modeling of complex nanocomposite systems, as well as for quantifying influences of temperature, nanoparticle sizes, and component concentrations on material electronic structure and sensor characteristics.
The model includes two coupled sets of differential equations. The first one is a set of kinetic equations describing processes on the surface of the metal oxide nanoparticle (for example, In2O3), taking into account electron capture from the volume by an oxygen atom. The second is an integro–differential equation describing the distribution of the conduction electrons nc(r) along the radius of the nanoparticle.
The sensor effect
θ is given by
Here nc(R0) is the electron concentration r = R0, where R0 is the nanoparticle radius. In order for the sensor effect to be calculated, both sets of equations must be solved twice, that is, in the absence of hydrogen (), as well as in its presence ().
Let us assume the following kinetics of electron capture from the nanoparticle volume and its return upon interaction with adsorbed oxygen, supplemented by the reaction with hydrogen:
where
is active adsorption position at the nanonaparticle,
is oxygen molecule in the gas phase,
is adsorbed oxygen molecule,
is conduction electron,
is negatively charged oxygen atom,
is hydrogen molecule in the gas phase, and
is water molecule in the gas phase.
The following set of kinetic equations corresponds to the reactions (2)–(4):
Here
and
are the unknown surface densities of O
2 molecules and negative ions (anions) O
−, respectively. The set of Equation (5) contains several physical parameters, namely rate constants of processes occurring at the surface of the nanoparticle. The latter rate constants obey Arrhenius dependence (
Table 1). All the calculations were carried out in atomic units [
32]. Dimensions of constants as well as units for time, length and energy (
th,
a0,
Eh) are also presented in the atomic system.
Here, is power exponent, is pre-exponential factor, is adsorption activation energy, is pre-exponential factor entering expression for probability per unit time, of oxygen molecule desorption, is binding energy of oxygen molecule with the surface of indium oxide, is grip section of electron capture into acceptor level of oxygen atom, is electron mass, is frequency factor, is activation barrier, A is pre-exponential factor, is activation barrier, is Boltzmann constant, and T is temperature.
The set of Equation (5) is solved under steady-state conditions, that is under an assumption of electron and adsorption equilibrium. Thus it is justified by the fact that rate constants of all the considered processes are much larger than reciprocal time of sensor response. It should be noted that the unknown function
, which in the absence of hydrogen in the system is determined by the rates of filling and depletion of oxygen traps, can be easily found as
. In order to determine the distribution of conducting electrons
nc(
r) along the nanoparticle radius, it is necessary to solve the following nonlinear integro–differential equation [
33]
Here nv is donor concentration, εv is energy of donor level, m* is effective electrons mass, εr is relative dielectric permittivity, δ is thickness of the layer with oxygen traps, V is volume of spherical layer from R0 to R0 + δ, is constant, which corresponds to uniform anion distribution within the layer with oxygen traps. The unknowns are , , φ(r) (electrostatic potential), and C1 is the number, which has to be found to enforce the condition of electrical neutrality.
The problem with the boundary conditions [
13,
34]
is considered for Equation (6).
While solving the boundary value problem (BVP) (6) and (7) one also has to enforce the condition of electrical neutrality, that is, the number of electrons generated by donors must be equal to the sum of electrons within the conduction band and oxygen traps
The set of Equations (5) and (6) are coupled since set (5) contains the value of electron concentration in the vicinity of nanoparticle surface
, while Equation (6) contains variable
which depends on
. The major reason for the latter coupling is the possibility of electron exchange between the volume of the nanoparticle and its surface. On the one hand, electrons are distributed along the radius of nanoparticle, while on the other hand, those electrons located near the surface may reach that surface. For this reason, the distribution of electrons along the radius of the nanoparticle exhibits a typical “boundary layer” located near the surface, as shown later in the paper. As far as boundary conditions (7) are concerned, these were obtained and fully explained in publications [
14,
34]. The value
will be considered from now on as a parameter and denoted as “
a”. The value a must be found in the process of the solution of the whole problem in order for the set of Equation (5) to be compatible with the solution of Equation (6).
4. Numerical Solution
Modeling sensor properties and electronic structure of semiconductor nanoparticles includes solutions of both the direct and the inverse problems. The direct problem consists of the determination of the sensor effect
θ(
T) as a function of temperature, at given input values of physical parameters. The solution of the inverse problem delivers the set of physical parameters such that, when used as an input into the direct problem, it provides a good approximation of the experimentally measured sensor effect
θexper(
T). An algorithm of the numerical solution of the direct problem has been discussed in detail in another publication [
34]. The same paper demonstrated that instead of BVP (6) and (7) with the discontinuous, at
r =
R0, right hand side (RHS) Φ(
φ(
r)) one may solve the set of Equation (6) within the nanoparticle, i.e., for
r ∈ [0,
R0] only, but with the modified boundary conditions
The boundary condition (10) was obtained for the first time in [
34]. This condition is derived upon obtaining the analytical solution of the differential Equation (6) on the interval
R0 < r <
R0 +
δ where the right hand side of the latter equation is constant. Matching this solution, enforcing continuity of the solution itself, its derivatives, as well as the condition of electrical neutrality (8), with the “inner” solution on the interval 0 <
r <
R0 produces the condition (10).
Considering the inverse problem, the following objective function
F(
p1,
p2, …,
pd) (where d is the number of parameters) measuring square deviation of the theoretical curve can be obtained from the experimental one
Ti = 693 + (i − 1)20; M = 6, is introduced.
The inverse problem consists of finding the minimum of the objective function (11). The chemical reaction rates
and
depending on the five parameters
(see
Table 1)), play a significant role in calculating sensor effect.
Sensor effect of In2O3 oxide was measured in an ad hoc set of experiments. It was assumed, while post-processing the present experiment (within the temperature range from 693 K to 793 K), that the objective function depends on the above five arguments only; all the other physical parameters were fixed. Physical reasoning provides the following ranges of the argument variations: 0 < p1 < 2, 0 < p2 < 0.1, 0 < p3 < 10−5, 10−8 < p4 < 2 × 102, 0 < p5 < 0.04. The search for the minimum of the objective function (8) has been performed within these ranges of arguments.
Note that if the reaction rate constant
is zero or negligibly small
≪ 1, that is, the presence of H
2 molecules has no effect, then
and the value of the sensor effect is 1 (see Equation (1)). In this case the objective
F will have approximately the same values in all those points of the five-dimensional space whose projections belong to the region
≪ 1, i.e., the function
F will possess some kind of plateau. The existence of such a plateau complicates the application of the gradient descent method significantly. The application of the gradient-free optimization method PROTES [
25] in order to optimize the objective function
F(
p1,
p2, …,
pd) (11) aims at overcoming this problem.
Indeed, standard gradient-based methods may fail here as the gradient is zero, and many global optimizers (like some evolutionary algorithms) can stagnate. PROTES handles this through its probabilistic sampling and adaptive exploration mechanism: (1) The algorithm begins by sampling the entire parameter space (including the plateau) from a near-uniform distribution. While many samples on the plateau yield the same high value of F, this initial phase is crucial for mapping the flat region, (2) It is key that PROTES does not just select the absolute best points, but the top-k best points from its batch of samples (k = 5 in our case). Even on a plateau, there will be minor numerical variations or very slight slopes leading away from it, and PROTES latches onto these infinitesimal clues. (3) PROTES then updates its internal probability tensor to increase the likelihood of sampling around these top-k points. Crucially, it takes kgd steps along the “gradient” of the log-likelihood (not the gradient of F). This means it is aggressively focusing its search on the neighborhoods that have shown even a hint of promise, effectively “sniffing” its way off the flat plateau and down the slopes towards the region where is larger and begins to affect the sensor response.
We note that the objective function F (Equation (11)) was used as-is, without adding explicit regularization terms. The structure of the physical model itself provides an implicit constraint, as parameters must lie within physically meaningful bounds. The primary mechanism to prevent premature convergence was not to use an “early stopping” criterion at all. Instead, we relied on a fixed computational budget of m = 1000 evaluations of the expensive objective function. This budget was chosen to be sufficiently large based on the problem’s dimensionality (d = 5) and the observed convergence behavior in preliminary tests. PROTES was allowed to run until it exhausted this budget, ensuring it had enough time to explore the flat region and subsequently converge to the optimum.
In summary, PROTES is uniquely capable of handling a plateau-like performance of the objective function because it transforms the high-dimensional optimization into a problem of learning a probability distribution. Its efficiency in flat regions stems from its ability to use even small differences in the objective function to guide this learning process, without relying on gradients of the objective function itself.
PROTES is particularly efficient for high-dimensional discrete problems where an exhaustive search is computationally infeasible, such as in our case with
= 5 parameters. This method represents multidimensional arrays (tensors) as a series of small three-dimensional interconnected tensors (cores), which greatly reduces computational and memory requirements. The TT-based approach can be successfully applied to various problems of physics and computational mathematics, including calculation of multidimensional convolutions [
35,
36], approximation of multidimensional integrals and integrals dependant on parameters [
37,
38], solutions of differential equations [
39,
40], accelerating artificial neural networks [
41,
42], etc.
The PROTES method takes advantage of the low-rank TT-approach and allows, as shown in [
29], gradient-free multidimensional optimization to be performed for a wide class of functions with lower budget costs than popular alternative approaches. This method and its several modifications [
31] have been successfully applied to various gradient-free multidimensional optimization problems [
43,
44]. Fundamentally, PROTES operates as an artificial intelligence technique by leveraging machine learning principles for adaptive optimization. Its core mechanism employs gradient-based updates to the expectation tensor, effectively transforming a high-dimensional search into a trainable probabilistic model that intelligently converges toward the global optimum. The previously described advantages of the TT-based approach and examples of its successful application to a wide class of problems determined our choice of the PROTES optimizer to solve the inverse problem in automatic mode.
Figure 1 presents schematic of the PROTES algorithm for the present case.
The PROTES algorithm is executed iteratively:
Initialization. A discrete probability distribution
p over the
d-dimensional parameter space is initialized randomly, favoring uniform exploration (i.e., for initial queries, candidates for the optimum will be generated randomly from a distribution close to uniform). The efficiency in high dimensions stems from the compact representation of the discrete array (tensor)
p in a structured low-rank TT-format [
30], which leads to the fact that the number of parameters in the tensor representation depends linearly on the dimension d of the optimization problem.
Sampling. At each iteration, K random sets of parameters for l = 1, 2, …, K are sampled from the tensor p with a probability proportional to its corresponding values. Note that at the first iteration this will correspond to sampling from a random distribution, and at subsequent iterations the adjusted distribution function will be taken into account. Due to the use of the TT-format, sampling is carried out efficiently with linear complexity in terms of dimensionality.
Computation. The forward model is solved for each set of parameters l = 1, 2, …, K, and the objective function values F (quantifying mismatch with experiment) are computed.
Selection and Update. The k samples with the lowest values of F (i.e., best fit to the data) are identified, then the distribution p is updated to increase the probability of sampling these top-k candidates and their neighbors in subsequent iterations. This update is performed efficiently using kgd gradient-ascent steps on the log-likelihood of the selected samples, leveraging automatic differentiation according to the machine learning paradigm.
Termination. The above steps are repeated 2–4 times until a predefined computational budget (maximum number of objective function evaluations) m is reached. As iterations progress, p evolves from a broad distribution into a highly peaked one that is centered near the optimal parameter set, , and in this case, the approximate value of the optimal parameters will be among the sampled sets during the optimizer iterations.
Computer code, combining PROTES with calculations of objective function, has been developed based on the above algorithm. The following PROTES parameters were used: K = 20, m = 1000, and k = 5.
The PROTES hyperparameters were selected to balance exploratory searches with the computational cost of each function evaluation (one evaluation requires a full solution of the direct problem). The batch size
K = 20 and the number of best samples
k = 5 lie within the range of standard values recommended in the method’s literature [
29] for efficient sampling and updating. The total budget was set to
m = 1000 evaluations, which provided a tractable runtime and proved sufficient for convergence to a solution with satisfactory agreement to experiment, demonstrating a dramatic reduction in time compared to manual fitting.
As mentioned earlier, evaluating the objective function
F (Equation (11)) requires solving the entire coupled sets of equations twice: once in the reference state (
) and once in the sensing state (
). This is an inherent requirement of the physical model, as the sensor effect
θ is defined as the ratio of electron concentrations in these two distinct chemical environments (Equation (1)). A single evaluation of F on our test workstation (Intel Xeon CPU, 32 GB RAM; Intel Corporation, Santa Clara, CA, USA) takes approximately 20–25 s. This time is dominated by the iterative procedure to find the compatible parameter a (
Section 3), which itself requires multiple calls to the BVP solver for each of the two gas states. Therefore, the total cost for one objective function evaluation is roughly double that of solving the system for a single gas state.
While further optimizations (e.g., using a faster root-finding method like Brent’s, or GPU acceleration for the BVP) are possible, the current implementation, leveraging algorithmic robustness and built-in parallelization of PROTES, strikes a good balance between speed, reliability, and code complexity for the problem at hand. The 4–6 h runtime for a full optimization (as discussed in our response to the previous comment) is considered highly efficient compared to the days required for manual fitting.
Certainly, parallelization of the algorithm of the solution of the direct problem would cut the solution time considerably. This is not only due to the necessity to solve the direct problem twice, but in relation to the fact that the objective function is calculated at several different values of temperature.
Parallelization has not been performed in the present study and is left for future investigations.
5. Comparison of Calculated and Experimental Results
Indium oxide was obtained by hydrothermal synthesis, with the average particle size being about 23.5 nm. According to TEM (Transmission Electron Microscopy) data, indium oxide particles of rhombohedral structure have essentially a spherical form and are homogeneous in size.
In order for the sensor response to hydrogen to be determined, synthesized indium oxide was transformed into a metal oxide film. To achieve this, powder was rubbed with distilled water. The suspension, obtained as a result of this process, was placed onto polycor plates equipped with platinum heaters and contacts for reading electrophysical parameters. The plate covered by suspension was gradually heated up to the temperature of 770 K and kept until the resistance of the obtained metal oxide film achieved a constant value.
Measurements of H2-detecting sensor responses of synthesized films were performed in a special set of experiments within a temperature range from 693 K to 793 K. Temperature was maintained within the accuracy of 1 °C. A chip with an applied sensitive layer was placed within the chamber, with a volume of about 1 cm3. A tested hydrogen–air mixture, containing 1100 ppm H2 was pumped through the chamber with a flow rate of 20 mL/min. Sensor resistance variation was captured using Digital Multimeter Keysight (Keysight Technologies, Inc., Santa Rosa, CA, USA) and recorded by computer. Specialized software delivered the kinetic curve of resistance variation. Based on the data sensor response θ = Ri/Rg, where Ri is initial (prior to mixture injection) and Rg is minimal (achieved after injection), the value of sensor resistance was determined.
Figure 2 presents the results of the experimental data post-processing in the range from 693 K to 793 K using PROTES. The post-processing provides the following values of the physical parameters:
Figure 2 demonstrates that for the obtained parameters (12), the computed curve qualitatively approaches the experimental one; quantitatively, its deviation from the experimental curve does not exceed 5%. This quantitative measure of the deviation is obtained as the NRMSE (Normalized Root Mean Square Error) between the calculated and experimental values over the whole range of the temperatures considered. The exact value calculated by the NRMSE formula is 3.5%. Recall that each invocation of objective function includes the solution of the BVP (6)–(10), enforcement of the condition of electrical neutrality, as well as adjustment of the parameter a. Improved accuracy of approximation may be achieved by significantly increasing the parameter m used by PROTES. In this case, adjustment time increases significantly.
Apart from the comparison of computed and experimental data presented in
Figure 2, the gradient-free optimization method PROTES allows the data required for the reconstruction of electron distributions at different locations within nanoparticle, e.g., along its radius (
Figure 3), to be obtained.
For example, it is easy to see that distribution of conduction electrons within the interval 0 ≤
r/
R0 < 0.85 does not change, while their concentration starts to drop upon approaching the border (
Figure 3). This results from the possibility of electrons escaping from the volume to the surface of the nanoparticle, where they are captured by adsorbed oxygen atoms. The lack of electrons in the layer adjacent to the surface results in the emergence of an internal electrical field within the nanoparticle. This field repulses free electrons back to the center of the nanoparticle [
14,
16]. Note that the distribution of conductive electrons within the nanoparticle is a function of temperature, the radius of the nanoparticle, and the relation between the energies of donor levels, the bottom of the conduction band and the binding of electrons with adsorbed oxygen atoms.