Precision Modeling : Application of Metaheuristics on Current – Voltage Curves of Superconducting Films

Contemplating the importance of studying current–voltage curves in superconductivity, it has been recently and rightly argued that their approximation, rather than incessant measurements, seems to be a more viable option. This especially becomes bona fide when the latter needs to be recorded for a wide range of critical parameters including temperature and magnetic field, thereby becoming a tedious monotonous procedure. Artificial neural networks have been recently put forth as one methodology for approximating these so-called electrical measurements for various geometries of antidots on a superconducting thin film. In this work, we demonstrate that the prediction accuracy, in terms of mean-squared error, achieved by artificial neural networks is rather constrained, and, due to their immense credence on randomly generated networks’ coefficients, they may result in vastly varying prediction accuracies for different geometries, experimental conditions, and their own tunable parameters. This inconsistency in prediction accuracies is resolved by controlling the uncertainty in networks’ initialization and coefficients’ generation by means of a novel entropy based genetic algorithm. The proposed method helps in achieving a substantial improvement and consistency in the prediction accuracy of current–voltage curves in comparison to existing works, and is amenable to various geometries of antidots, including rectangular, square, honeycomb, and kagome, on a superconducting thin film.


Introduction
Precisely measuring critical current density in superconducting materials and systems requires understanding of the fundamentals of current-voltage (IV) characteristics, also called the transport measurements [1].Recent studies have shown that these IV curves may show sudden jumps, which resemble Shapiro steps, in voltage around critical current (I c ) and/or critical temperature (T c ) in superconducting thin films.These steps usually appear when the vortex lattice is formed, which may lead to instability at high vortex velocities.These instabilities have been studied in different systems including nanotube Josephson Junctions [2], superconducting nanowires [3], low temperature thin films [4], and a square array of periodic antidots on an Nb film [5].Studying such behavior in superconducting films is important since Shapiro steps are exploited to make flux Qubits that are essential for superconducting quantum computers [6].Other mechanisms that may lead to such jumps include thermo-magnetic instabilities of vortex matter [7], thermally-assited flux flow [8], or just overheating of the superconducting film on SiO 2 substrate.
Fabrication of various geometries by electron-beam lithography, to obtain transport measurements via a physical properties measurement system (PPMS), is an expensive, tedious and cumbersome process [9].It has been recently pointed out that these curves, especially when they are needed for a wide range of temperature and magnetic field values, may not be necessarily measured incessantly; instead, they may be obtained using some approximation technique applied on a finite amount of curves already obtained via PPMS [10][11][12].Artificial Neural Networks (ANN) have been used as the approximation method in each of these solutions to extrapolate the IV curves for unforeseen values of critical parameters.It shall be discussed in Section 2.2 that training the ANN requires two randomly generated networks' coefficients called weights and biases, which are updated in each iteration, until an optimal solution (with the smallest mean-squared error, MSE) is obtained.Mainly due to this randomness in the coefficient generation, ANN tends to converge to one local minima from a pool of possible solutions, which makes training of the ANN a nondeterministic process.The latter, in turn, may lead to inordinately varying values of MSE, number of iterations, and time to converge.In this way, two rounds of training on the same data may result in different prediction accuracies.
The proposed work intends to address the aforesaid problem by controlling the randomness in coefficient generation, such that the ANN always converges (close) to the global minima, by means of a novel entropy based genetic algorithm.Our case study is the prediction of IV curves for four different geometries of antidots, including rectangular, square, honeycomb, and kagome, on an Nb film.However, the proposed approach is equally applicable to other superconducting films as an alternative approximation technique for measuring other properties at the same time.The main contribution of the proposed work, therefore, is the increased accuracy in the prediction of IV curves by means of Entropy, which in a nutshell is the uncertainty measurement associated with initialization of the weights and biases.Since ANN are highly dependent on their initial conditions for their fast convergence and accurate approximation, the selected vector of weights and biases should have a minimum entropy.To the best of our knowledge, entropy, especially in conjunction with genetic algorithm (GA), has never been adopted for the prediction of IV curves for a superconducting film, which, we show, outperforms the conventionally used predictors.
The rest of the paper is organized as following: Section 2 covers our experimental setup and a brief overview of existing methodologies and relevant works.We present the problem formulation in Section 3. Section 4 presents the proposed methodology, and an overview of GA.In Section 5, we present simulation results and a comparative analysis of the used algorithms, before we conclude the paper in Section 6.

Related Work and Experimental Setup
In order to be able to draw a fair comparison between the proposed methodology and the existing works, we decided to adopt the same experimental setup and the superconducting film of the same thickness.In what follows, we give an overview of the experimental setup and fundamentals of the ANN, which is mainly an extract of the benchmark works [10][11][12].

Experimental Setup and Measurements Using PPMS
For our experiments, we deposited a high quality 60 nm Nb film on a SiO 2 substrate, followed by fabrication of micro-bridges by ultraviolet photolithography and dry etching in order to obtain transport measurements.The desired geometries of circular antidots were obtained by applying e-beam lithography on a photo-resist layer.Finally, magnetically enhanced dry etching transferred the patterns to the film.A commercially available PPMS was used to perform the transport measurements.Figure 1 presents the scanning electron microscopy (SEM) of various geometries.In the IV curves measured at different values of temperature and magnetic field, shown in Figure 2, we observed Shapiro steps around T c .However, these steps continued to weaken with increasing temperature until it completely vanished.These temperature dependent curves might be divided into three regions: between two regions-in each of which the curve showed some slope-there was a region that comprised the Shapiro steps.We repeatedly varied magnetic fields and kept the current constant, and vice versa to collect 25,600 samples arranged in a 4 × 4 × 1600 matrix for each geometry, where 4 × 4 refers to the 4 constant values of current and magnetic fields.In our proposed work, we isolated one sample from each matrix and subjected the rest for training the ANN model.Once trained, the isolated sample should be compared with the predicted curve for the same values of magnetic field and temperature.

Approximation Using ANN
ANN imitate a human brain and are supposed to perform learning in a given situation and repeat in another [13].They tend to establish relationships between independent variables of a small subset, and are widely used for approximation on a larger sample space by utilizing those relationships.An ANN architecture, mainly, is an interconnection of several neurons [14] in a directed graph, which provide mapping from an input layer to an output layer via a few hidden layers sandwiched in between.The neurons in each layer carry some real weights and bias-together called networks' coefficients.A different set of weights, which are updated in each iteration of the training process, leads to a different network's response.A number of training algorithms exist in literature-each of which updates the coefficients in a unique manner, but, for most of them backpropagate, their errors from the output layer to input layer in order to minimize the objective function.The goal of this training or learning process is to achieve the smallest mean squared error (MSE) between the target and the actual system's response [15].The network's response carrying only one hidden layer is given by Equation (1): where δ o and δ H are the threshold or activation function in the output and hidden layer, respectively, w and b represent weights and bias terms, respectively, and x i represents the ith element in the input layer.Without going into details of the available training algorithms, in this work, we will stick only to those that were used for the same purpose in the benchmark works.The first work [12] that proposed to approximate the IV curves for an Nb film made use of Bayesian Regularization (BR) algorithm for training the ANN for a square array of antidots.The algorithm converged in fifty-five iterations (epochs), and managed to achieve the best MSE of 2.08 × 10 −8 .This work did not explore other available options-be it in terms of training algorithms or ANN architectures.
The second benchmark work [10] in contrast did the same for a diluted square lattice and presented a thorough comparison between three different ANN architectures, each with ten different configurations (number of neurons in each layer), and trained by six training algorithms-a total of 180 ANN models were developed, and the best MSE obtained was 4.55 × 10 −8 .Figure 3 shows the MSE obtained by all the models, where cascaded, feedforward, and layer recurrent are the three ANN architectures; it is important to note the diversity in the obtained results, which is mainly due to random nature of the generated coefficients.
The last benchmark work [11] was an extension of the second benchmark, in which a comparison was drawn between four geometries of antidots on the same Nb film.The lowest MSB reported in this work was in the order of × 10 −9 for a specific architecture and a specific training algorithm.However, the diversity in the obtained results was once again enormous, and nondeterministic.
Other than these benchmark works, there have been a few notable attempts on proposing formal models for analyzing IV curves for superconducting devices and films.For instance, a formal, self-consistent, model was presented for estimating critical current of superconducting devices [16].The authors admitted that an array of antidots based thin film was very difficult to model mathematically.Although its computation burden was less than that of the ANN based models, it lacked accuracy-tolerance of 4-6% within actual values as compared to the ANN models having tolerance less than 1% with the actual values.Another attempt based on ANN was presented by Bonanno et al. [17].The authors proposed a radial basis function neural network (RBFNN) and demonstrated that it had a prediction accuracy of about 10 −1 (MSE), which is even lesser than the already existing techniques-let alone our current model based on a genetic algorithm, achieving MSE in the order of 10 −9 .

Problem Statement
Let Ω ⊂ R ν , where ν = 2; ν → { R × D }|R > D , be an experimental values.Let the selected feature set, φ = φ(i)|i ∈ Ω where φ 1 (i), ..., φ n (i) ⊂ φ are associated with the ANN training process to predict output vector φpred .The set of features φ are mapped to φpred : φ → φpred .The output vector φpred is calculated as: where the set of variables δ k WB , pop t , δ R , δ s , δ k E , X o f s and φ y pred have been described in the list of symbols.The objective function in case of ANN is selected to be MSE given in the following relation: (2)

Genetic Algorithms
Genetic Algorithm (GA) is an evolutionary technique belonging to a class of stochastic search algorithms, which finds an optimal solution from a pool of solutions based on the principle of survival of the fittest [18].In GA framework, each individual is represented by a string called a chromosome, whereas a group of chromosomes generate a population.For this architecture, weights and biases vector (chromosome) is generated, which is replicated multiple times to generate a population of some pre-defined size.
Binary representation of a GA is commonly used where each chromosome is a vector c, constituting a set of m genes from the set 0, 1: where m is the length of a chromosome.However, in practical optimization, it is more natural to represent a gene in real numbers for an optimized solution [19].The continuous domain provides larger space and more convergence possibilities.Data range is normalized to the range of 0 −→ 1 prior to binary encoding, and, for this specific application, the chromosome is a floating point vector.In what follows, we present a few GA operators, called crossover and mutation, implemented for a thorough technical analysis.

Crossover Operators
Selecting a pair of chromosome m ) for a multiple crossover description.

Flat Crossover
An offspring O i is generated where , where m is the index for the number of genes and k is the index for the number of chromosomes.

Arithmetic Crossover
In arithmetic crossover, two Offsprings are generated, O k = (g k 1 , g k 2 , ..., g k m ), k = 1, 2 where: Here, λ is a constant and user-defined value, which can vary with the number of generations.

Mutation Operator
Let the maximum number of generations be represented by R max and R t denotes the generation on which mutation operator is applied.As per mutation rule:

Selection
Selection is a process of selecting chromosomes with the smallest value of the cost function.The selection rate ς rate defines the survivors eligible for mating in the next generation.Generally, the ς rate = 50%, and the population selection is defined as: The selection probability depends on the cost weight, calculated as: The chromosome with the lowest cost in terms of mean squared error has the highest selection probability.The other selection methods include roulette wheel, rank selection and tournament [21].Roulette wheel is a probability based method, whereas the tournament selection is a winner-takes-all based technique [22].

Operation of a GA
The operation of a GA is summarized as follows: 1.
Create an initial population from a randomly generated weights and biases vector.

2.
Repeat until the best individuals are selected: • Evaluate the fitness using MSE, • Select the parents with best fitness level, • Apply the selected crossover and mutation operators.
Optimization is a process of finding the best possible solution from a given search space.In this work, an evolutionary strategy is utilized to fine-tune the parameters so as to minimize the cost function.Considering the GA's exquisite performance on various platforms, and their ability to handle larger space problems even for stochastic objective functions, another domain is exploited, which is to optimize the of artificial neural networks.The procedure has two possible directions: (1) optimize the weights and biases; and (2) optimize the architecture of ANN.In this work, the former method is used to improve the prediction accuracy.

Entropy Based GA for Optimization
For a set of discrete random variables , the Shannon entropy is defined as the following: where G j i is a vector of weights and biases called a chromosome.The proposed cascaded design is the conjunction of three separate modules comprising an entropy calculator, GA optimizer, and ANN's approximator.Here, the entropy calculator controls the uncertainty with the constraint of minimum randomness.The resulting vector, which is then forwarded to the GA module for optimization, comprises weights and biases with either minimum entropy or maximum repetition.Finally, the third module uses the optimized weights and biases to train the ANN for final approximation. Figure 4 summarizes the proposed methodology, and therefore the major contribution of this work, in a flow chart, and the proposed curve approximation with the optimized coefficients approach is given in Algorithm 1.

Design Parameters
The number of hidden layers is fixed at three, where each has 27, 17, and 8 neurons in order; this is generally represented by network's configuration {27 17 8}.The mutation rate is defined to be 0.2 and entropy count variable κ is selected to be 200 for the optimal solution.Maximum generations of GA and maximum epochs of ANN are fixed to be 1000, while GA tolerance value is selected to be 1 × 10 −8 .Learning rate δ r and initial step size δ s are selected to be 0.9 and 0.8, respectively.

Table 1 explains the numerical comparisons of five different training algorithms which include
Levenberg-Marquardt (LM), three variants of Conjugate Gradient (CG), and Bayesian, utilizing three crossover techniques for GA over four different geometries.It can be clearly observed, linear-BGA crossover technique in conjunction with Bayesian regularization framework outperforms other methods in terms of minimum epochs and MSE for all geometries.For the mentioned sequence (Linear-BGA + Bayesian Regularization), epochs are in the range of (36-47), whilst minimum MSE achieved is in the range of (6.89 × 10 −9 -8.87 × 10 −9 ).The worst performance with linear-BGA is of CGP backpropagation having epochs in the range of 164-394.Similarly, with other crossover techinques, still Bayesian backpropagation acheives minimum MSE and also minimum epochs.
Figure 5 presents the MSE comparison of selected crossover techinques on all four geometries.It is clear from the given plots that the proposed algorithm works efficiently when the selected crossover method is linear-BGA.In order to carry out a fair comparison, the prediction for the same geometries and algorithms is also performed with conventional ANN techniques, as proposed by the benchmark solutions, as tabulated in Table 2.It may be noted that the epochs and MSE obtained by the proposed method are significantly improved, which justifies the effectiveness of our novel design in terms of prediction accuracy and consistency.In Figure 6, we have presented a comparison between the predicted and the physically measured IV curves.Note that the curve given here was not included in the training process of ANN, and was measured at 8.65 K temperature, and 41 Oe magnetic field, which are identical to the one used by the benchmark works for cross-checking the obtained result.

Conclusions
Being predominantly dependent upon random initial conditions, the generated network's coefficients mostly force the artificial neural networks to converge to a different local minima on every execution.This generally leads to an inconsistent prediction accuracy, even for identical experimental setup and tunable parameters.In this work, we have proposed an entropy based genetic algorithm, and used it to control the randomness in coefficient generation.This technique forces the network to converge (close) to the global minima on every execution, which constrains the prediction accuracy, measured in mean-squared error, to a certain acceptable level.We have applied our technique to approximate the current-voltage curves for four different lattices on a superconducting film, and compared our results with three recent works, which made use of artificial neural networks to achieve the same.Our results have shown that the proposed methodology yields better consistence and greater prediction accuracy.

Figure 2 .
Figure 2. The IV characteristics of four geometries at different temperatures and magnetic fields: top left-right (rectangular-square), bottom left-right (honeycomb-kagome).

Figure 3 .
Figure 3. MSE achieved by a benchmark work for three ANN architectures.Reprinted from [10] with permission.

Figure 4 .
Figure 4. Flow chart depicting the proposed methodology.

Figure 5 .
Figure 5. MSE comparison of various cross-over techniques on the selected geometries.

Algorithm 1 :
Entropy based GA for weights and biases optimization Result: Curve Approximation Initialize step size (δ S ), learning rate (δ r ), k, and weights and biases vector δ WB .3 Calculate Entropy δ k E of vector δ k WB using Equation (1), k times.4 Calculate F maxrep and V min from δ k E .Weights and biases vector with entropy δ WB ) 12 t GA ← 0 13 pop 0 ← Initialize from δ WB with (pop size ) 14 Evaluate pop 0 15 while t GA < Gen max do 16 Parents (χ par ) ← Select χ par from pop t 17 Offspring (χ o f s ) ← crossover (χ o f s , C prob ) 1 Begin: 2 18 mutation (χ o f s , M prob ) 19 Evaluate χ o f s

Table 1 .
Comparison of ANN's training algorithms with multiple GA operators.

Table 2 .
Comparison of various ANN's training algorithms as used in benchmark works.