1. Introduction
The mobile communication systems advancement and network congestion have been quickly increasing in recent years, particularly in the indoor atmosphere. Comprehensive bandwidth technology is required to address the growing bandwidth requirements. The Federal Communication Commission defines one such technology termed as an ultrawideband (UWB). The UWB technology offers a wider operational frequency from 3.1 GHz to 10.7 GHz with an effective isotropic radiation power of −41.3 dBm or 75 nW. The permissible radiated power in the UWB technology confines it for short-range, high-speed data communication and is well suited for indoor applications. The antenna is one of the major sub-components that degrades overall system performance. The printed monopole antennas are preferred more for modeling the UWB antennas as they provide many merits such as low cost and weight, easy fabrication, low profile, and ease of integration with other radio sub-system blocks in the transceiver systems. The recent studies in the UWB monopole antenna design exhibit various methodologies, such as fractal-based UWB and Vivaldi UWB antennae, incorporating the parasitic elements onto the radiating and ground plane to achieve the UWB spectrum and metamaterial loading and so on. The antenna structure and geometrical information are the uniqueness of all the methodologies mentioned earlier. To find the optimal dimensions of the antenna structure by performing a detailed parametric analysis. A comprehensive analysis of realizing the ideal dimensions of the antenna is a time-consuming process with the usage of a computational facility. As a result, researchers and enterprises are paying attention to emerging technologies such as artificial intelligence (AI) in antenna design.
In the recent decade, AI has been applied to solve diverse problems pertaining to engineering, economics, medicine etc. Machine learning, considered a subset of AI, uses computational statistics to define the relationship between input and output by developing a mathematical model. The trained mathematical model could be used to interpolate the output based on the input as it is data-driven. This notion forms the basis of the application of machine learning algorithms in microwave engineering, with the end goal of minimizing the iterations in a selection of optimal parameters for the design of antennas with specific requirements. The relationship between antenna dimensional parameters and performance parameters is nonlinear; thus, a regression model is best suited to determine the design parameters for specific performance values [
1].
The high dimensional nonlinear data can be optimized using machine learning [
2]. Several researchers have employed soft computing approaches to determine the parameters. Zhang et al. [
3] used particle swarm optimization for CNN for fragmented antennas. Scattering parameters of capacitively fed antennas have been determined using a multilayer perceptron in [
4]. In [
5], a deep belief network based on Bayesian mechanism is used to determine the coupling matrix. Similarly, particle swarm based optimization have been extensively in the recent years for optimization of deep neural networks [
6,
7,
8]. In [
9] an extreme machine learning model have been employed for determining the UWB antenna parameters. Optimization algorithms such as simulated annealing algorithm, genetic algorithm (GA), ant colony optimization, differential evolution, grey wolf optimization, sine-cosine optimization, and many others can be used to optimize the design parameters of the antenna [
10,
11,
12,
13,
14,
15]. Designing antennas using theoretically derived antenna parameters is laborious, complex and time consuming process. The performance of the antenna such as good multi-band operation, better gain, bandwidth, and high gain, can be achieved by intelligently tuning the design parameters and geometric properties using machine learning algorithms. The contributions of the proposed research are as follows:
An efficient method for feature extraction using statistical and regression properties is proposed.
A comparative analysis of regression models for studying the effect of antenna design parameters on performance is performed.
An optimized random forest classifier is designed for effectively determining the S parameter values for the corresponding dimensional parameters of UWB antenna.
The organization of the paper is as follows: The antenna antenna design is explained in
Section 2, methodology is briefed in
Section 3. The results obtained from the study are presented in
Section 4. The future aspects, the implications of the study are discussed in
Section 5.
2. UWB Antenna Design
The circular monopole antenna is designed at the very first step using the empirical formulas as represented in Equations (1) and (2) [
16].
The radius (r) of the circular patch is determined by
where h thickness of the substrate,
is the dielectric constant of the FR-4 substrate and is 4.4, and F is given by,
where
is the resonating frequency of the antenna and is equal to 7.5 GHz used for the initial calculation of the circular monopole antenna.
To realize the UWB spectrum, the monopole antenna is modified by incorporating the rectangular stub on the radiator of the antenna. Further, the ground plane of the proposed antenna is lowered, as depicted in
Figure 1. The circular monopole antenna is designed using above mentioned empirical equations. Further, the proposed antenna ground plane is lowered to enhance impedance bandwidth. The lowered ground plane alters the current distribution and affects the transmission line characteristics resulting in reduced quality factor and improved bandwidth. Additionally, a stub is used on the radiator to further improve the impedance bandwidth at the lower and higher frequency in the UWB spectrum. The antenna design parameters were obtained by importing the values obtained using HFSS. The geometrical modification of the antenna has a significant impact on its performance of the antenna such as bandwidth, gain, and radiation characteristics. Therefore, geometrical parameters are varied from minimum to maximum values with the incremental step size of 0.1 mm and recorded the antenna performance. This manner dataset is created to achieve the automated and ideal values of the antenna structure to improve the antenna performance. The designed antenna is fabricated on an FR4 substrate and measured using a vector network analyzer. The geometrical details of the UWB antenna and its prototype are represented in
Figure 1a,b, respectively. The geometrical details of the UWB antenna are given in
Table 1.
Figure 1c represents the reflection coefficient of the proposed antenna. The reflection coefficient is measured by developing the antenna onto the FR4 substrate. The simulated and measured results are in line with operating from 3.1 GHz to 11.5 GHz. The antenna dataset is created by performing a comprehensive parametric analysis of the geometry of the UWB antenna and extracting the reflection coefficient responses for the various change in the antenna’s geometry, as represented below.
Parametric Analysis
The change in dimension of the antenna design and its impact on antenna performance is examined through a parametric analysis. This segment depicts a detailed antenna structure analysis to attain UWB and dual-band notch frequency features. First, to achieve the UWB frequency spectrum, the ground plane is lowered to less than a quarter of the entire ground plane. This process is carried out by simulating the design with a full ground plane, i.e., Z9 = 21.5 mm length exhibits −10 dB bandwidth from 11.5 GHz. Further, the reduction in Z9 in the descending order from 21.5, 17.5, 13, 8, and 4 mm correspondingly. Every magnitude of Z9 results in different S11 curves, as shown in
Figure 2. From this figure, it can be observed that till Z9 = 13 mm reflection coefficient is not below −10 dB. The ground plane length of 4 mm shows an impedance bandwidth between 2.9 and 5 GHz. Consequently, to improve bandwidth further in the UWB range, a rectangular shape is embedded between the circle and feedline of the radiator, as depicted in
Figure 1a. The optimal length (Z6) and width (Z5) of a rectangle are chosen by performing a parametric study. The initial value of Z6 and Z5 are labeled as 3.1 mm and 5 mm, respectively. These values are incremented by 0.5 mm and 1 mm in a sequential order till 5.1 mm and 9 mm. The corresponding S11 curve is depicted in
Figure 3. The optimal dimension of the rectangle is found to be 4.1 mm in length and 7 mm in width, providing an impedance bandwidth of 115% in the frequency range of 3.1–11.5 GHz. Similarly, other parameter values are extracted for the creation of the dataset.
3. Materials and Methods
3.1. Dataset
The dataset for the study is obtained from a composite structure of circular and rectangular-shaped UWB antenna having a simulated and measured operating frequency from 3.1 to 11.5 GHz with an impedance bandwidth of 115%.
3.2. Stepwise Regression Technique for Feature Reduction
The initial step involves by fitting the model to the individual predictor to determine the predictor with the lowest value of p (statistical significance set to a threshold of 0.05). This is performed for all the individual predictors, and each time the number of combinations of the feature changes. The end goal is to obtain a predictor set with feature values that are less than the threshold set for p. The
Figure 4 illustrates the stepwise regression methodology and Equation (3) represents the criteria for evaluation of feature subset.
where M is the criteria for evaluation of the feature subset,
is the average correlation, and
is the feature correlation between a pair of features.
Based on the aforementioned criteria a set of 3 attributes including frequency, length, and width of the antenna were found to be highly correlated with the class, while retaining a low degree of inter correlation between each other.
3.3. Regression Analysis for Prediction of Antenna Parameters
For validation, ten regression models, namely (i) mutlilayer perceptron regression, (ii) Bayesian additive regression tree, (iii) AdaBoost of decision stump trees, (iv) random forest, (v) decision table, (vi) Gaussian regression, (vii) lazy BK, (viii) K-star, (ix) locally weighted linear regression, and (x) support vector regressor have been used. The dataset consisting of antenna attributes for prediction of parameters is tested considering two different mechanisms as follows: (a) ten-fold cross validation method and (b) the data are divided in 60:40 ratio, with 60% used for training and 40% used for testing. The dataset consisted of length, width, and frequency attributes. The length, width, and frequency is varied in steps of 0.1 for diferenct values of frequency ranging from 1 GHz to 15 GHz. Thus resulting in 142 frequecny values. The corresponding S11 parameters for differenct values of length, width and freqeucny were computed using HFSS. The goal of the regression model is to predict the S parameter values for the corresponsing values of antenna attributes. In this regard, ten different regression models were tested. The parameter values were set initially by considering the best values of correlation co-efficient value obtained for the corresponding frequency and dimensions. Further, to test the generalization ability of the model, to be adopted in piratical UWB antenna scenario, a ten-fold cross validation was perfomed by incoportaing the optimized tunned values of regressor model obtained using the 60:40 train test set.
3.3.1. Multilayer Perceptron Regressor
A multilayer perceptron regression model is a regression model used in machine learning, especially in antenna design for mapping parameters from one space to another, where each space can be of any number of dimensions. The multilayer perceptron regression model trains iteratively. The MLP optimizes the squared error using the stochastic gradient descent. The correlation coefficient was used as the evaluation paramtere for predicting the loss at each epoch. The rectified linear unit represents the max(0,x) whereas for the identity function as the activation unit f(x) = x is used.The activation function in the last layer is performed initially using the rectified linear unit as the activation function. Further, several other activation functions were tested, such as sigmoid and tangent. However, the identity function was found to be the best activation function integrated into the last layer. At each iteration, the loss function partial derivatives are computed to update the parameters. The L2 regularization term is set to 0.0001. it is further divided by the sample size after addition to the loss at each epoch. An adaptive learning rate of 0.001 is used, it is kept constant as long as the loss obtained at the taining is gradullay reced at each epoch, if not it is further reduced by dividing it by a factor of two. Further, the number of iterations are set to 200. The continuous values of output are obtained as the square error is used as the loss function. Weights with larger magnitudes are penalized using the L2 regularization function. The decision function for different values of alpha is given in
Figure 5. These values are optimally obtained for the 60:40 train:test model, and maintained constant for the cross validation set-up.
3.3.2. Bayesian Additive Regression Tree (BART)
The BART technique is a bayesian ensemble technique that uses bayesian mechanisms to determine the posterior probabilities. The main reason for the BART to be Bayesian is the use of prior in contrast to the regression tress. The prior mimic the shallow trees with the value of the leaf tending towards zero. A good flexible approximation to the test set is obtained by fixing summation of several trees. Markov chain algorithm and back-fitting Monte Carlo algorithm is incorporated iteratively. The predictor space is partitioned into hyper triangles for approximation to an unknown function. The schematic representation of the BART model is shown in
Figure 6. Mathematically, the BART model is represented as given in Equation (4).
3.3.3. AdaBoost of Decision Stump Trees
Decision trees with one level of a split are termed as decision stump trees. The main reasons for incorporating the ensemble model is to decrease the variance and bais, and further to improve the prediction rate. The weighted value of the sum of sample is always equal to one. Further, the actual influence of the stump over the entire ensemble is calculated. Considering misclassification error obtained during the traning, the value of alpha varies, a negative value indicates a strong disagreement between the predicted and the actal values, whereas a positive value indicates a strong agreement between the predicted and the actual value. Each time an ensemble model is created by considering the misclassified points with higher weightage. This involves multiple iterations to create a strong decision boundary for the weak learners. The learning rate hyper parameter is initially set to 0.01, for training the model. A smaller value of the learning rate increases the computational time for processing. The schematic representation of the decision stump tree model is shown in
Figure 7.
3.3.4. Random Forest
Decision trees with bagging are trained using bootstrap aggregation. The initial step is random sampling with replacement performed for n instances in the dataset. In an individual tree, the number of features is chosen as three, and 20% of the variables are used in an individual run. Since there are three attributes a value lesser than three is selected to split the tree and the same value is held constant for the entire growth of the tree. The error rate relies on the strength of the correlation between the trees and the strength of each individual tree in the forest. A tree with a lower error rate is considered to be the strongest. The Algorithm 1, provides the pseudocode for generation of decision trees.
Algorithm 1: Generate Decision Trees |
Input: (Sample S,Features F) |
1.If stopping_condition(S,F) = true then |
|
a.Leaf = createNode() b.leafLabel = classify(s) c.return leaf |
2. root = createNode() |
3. root.test_condition =findBestSpilt(S,F) |
4. V= {v | v a possible outcomecfroot.test_condition} |
5.For each value v Є V: a.= {s | root.test_condition(s) = v and s Є S }; b.Child = TreeGrowth ( ); c.Add child as descent of root and label the edge {root → child} as v 6.return root |
3.3.5. Decision Table
It is an accurate methodology in contrast to the decision trees, wherein an ordered set of If-then rules are used for numeric prediction. The decision trees are considered as a base. The number of attributes used is three consecutively, and the three rows are used for building the decision table. The goal of the decision table is to generate rules for structuring of the attributes. The same set of rules are further used ofr the cross validation set. The main rule formulated for our application is based on the distance of the query attribute to the attributes in the train set, the one with corresponsding to the least distance is choosen. The prediction is performed by allocating the newly arrived attribute to the category iteratively. The performance of the attribute is tested using the best subset of cross validation attributes. The best fit search strategy avoids the consequences of getting stuck in the best fit maxima and thus is adopted to search the attribute space [
17].
3.3.6. Gaussain Regression
A set of a random variable with joint distribution are used in the Gaussian process [
18]. In the gaussian regression, the prior value of the mean is taken as the mean value of the dataset. The hyperparameter optimization is performed by taking the maximum of log likelihood function. Further, subsequent iterations are conducted by taking the initial values of the parameters as described in the Algorithm 2. The process is completely specified by its covariance, variance, and mean functions. The algorithm for gaussian regression is as follows:
Algorithm 2: Gaussain Regression |
The dataset represented as |
1. Input: D=[xi,yi ],where 1<i<N x is the attribute, y is the prediction, N is the number of data points. x={length,width,frequency};y=S11 parameters |
2.Gaussian funxtion fitted to the data yi=f(xi )+€ϵN(0,σ2 ) € is the noise termin gaussian, ϵ is the Gaussian distribution Zero mean and variance is represented by |
a.µ= |
3. kernels= Radial basis function, polynomial, normalized polynomial and Pearson VII |
4. V= Predicted value of the attribute |
3.3.7. Lazy BK
The objective function in the k nearest neighbor is computed using an estimation function. The lazy BK model as the name suggests is solely responsible for checking the outputs and properly packing the data values. The number of neighbors are controlled by identifying and validating the training dataset linearly as awell as quadratically. Due to this fact, the lazy BK is adopted even when the application varies [
19]. The number of k nearest neighbors is predicted using the upper bound set as 2, or they can also be set depending on the best performance obtained through cross validation. Several distance measuring attributes are used, such as Chebyshev, Manhattan, Euclidean, and Minkowski. However, the best performance was obtained using Euclidean distance.
3.3.8. K-Star
The K-star algorithm uses a distance metric termed as entropy to compute the variation of the data, from the training set [
20]. The entropy function is calculated using the mean value for transforming an instance to the other. The probability of this transformation occurs in a manner termed as “random walk. Here, for each class, a set of selected histogram features (dimensions and freqeuncy) are used as the input to the k-star model, and the missing values are replaced by the averge value of its corresponding neighbouring values. However, in our case there are no missing values in the dataset obtained from HFSS simulations. The regression result is computed by considering the sum of the probabilities with respect to the distance. The highest probability value is selected as the class or the variable value for the test attribute.
3.3.9. Locally Weighted Linear Regression (LWL)
The linear regression is a supervised algorithm for determining the linear relation between the predictor and the output [
21,
22]. The LWL assumes that the data are linear in nature; however, for a non-linear data as per our application the locally weigted regression model is employed. The total cost function is divided into multiple and smaller independent cost functions. Each datatpoint is treated as a weighting factor that expresses the influence of the datapoint over the predictions. The initial value of the cost function is as computed in Equation (5).
where x are the length, width, and frequency attributes, y is the S11 parameter. For each query, the value θ is computed. Higher priority or preference is given to the point belonging to the training set vicinity of the x, rather than far vicinity points. Based on this, the cost function J(θ) is modified as given in Equation (6).
The weight
is generated for each query point using the exponential function. Thereby, it was inferred that the points belonging to the training set contributed significantly to the cost function rather than the points located in farther vicinity. The graph for the predicted fitting is given in
Figure 6.
As it can observed from the
Figure 8, a curvy regression line is used to fit the data value for the corresponding predictions.
3.3.10. Support Vector Regression
The classification model on par with the support vector machine, is a support vector regressor (SVR/RD). The SVR works towards finding the best hyperplane to separate the training data into its classes [
23]. The best hyperplane that minimizes the cost function is chosen. Considering a batch size of 100 and a level set to minimal. A strict margin to draw the hyperplane is considered. The polynomial kernel is used as the kernel to fit the line to the data. The
Figure 9 provides an illustration of the SMO to the regression problem. The ordinary least square method is used to determine the weight vector and the bias values such that the error or the loss function is minimal. The threshold allowance is set to 0.1, such that all the datapoints lying within the threshold allowance value are not penalized. Further, to compute the error the vertical distance between the plane and the points is used.