Graphical User Interface for the Development of Probabilistic Convolutional Neural Networks

: Through the development of artiﬁcial intelligence, some capabilities of human beings have been replicated in computers. Among the developed models, convolutional neural networks stand out considerably because they make it possible for systems to have the inherent capabilities of humans, such as pattern recognition in images and signals. However, conventional methods are based on deterministic models, which cannot express the epistemic uncertainty of their predictions. The alternative consists of probabilistic models, although these are considerably more difﬁcult to develop. To address the problems related to the development of probabilistic networks and the choice of network architecture, this article proposes the development of an application that allows the user to choose the desired architecture with the trained model for the given data. This application, named “Graphical User Interface for Probabilistic Neural Networks”, allows the user to develop or to use a standard convolutional neural network for the provided data, with networks already adapted to implement a probabilistic model. Contrary to the existing models for generic use, which are deterministic and already pre-trained on databases to be used in transfer learning, the approach followed in this work creates the network layer by layer, with training performed on the provided data, originating a speciﬁc model for the data in question.


Introduction
Humans can identify patterns and objects, such as a clock, a chair, a dog, or a cat. However, for a computational model, pattern analysis is a considerable challenge. Using artificial intelligence (AI) [1] makes it possible for computer systems to perform pattern recognition. However, it is necessary to develop models capable of learning. Through these models, it is possible to achieve methods that align with how the human brain performs the analysis. Machine learning (ML) [2] can be seen as a subtopic of AI, whose algorithms make it possible for software applications to have high accuracy in predicting outcomes, even if they have not been specifically programmed to do so.
ML is translated into the execution of algorithms, thus automatically creating models capable of learning based on a set of data. It is necessary to train the ML models, giving them an appropriate set of data so that in this training process, the employed algorithm "learns", and iteratively adjusts itself. After training, the created model can make predictions with a high assertiveness value. Despite the substantial number of algorithms developed for ML, artificial neural networks (ANNs) have consistently been demonstrated to be the most adaptable, especially with the emergence of deep learning (DL) [3]. Theoretically, one can use ANNs for the detection of complex patterns, such as those present in images, but in practice, it may be unfeasible due to the number of calculations required being extremely high. This occurs because in a conventional ANN, each pixel is connected to each neuron. In this case, the computational load added makes the network unfeasible for image analysis [4].
Convolutional neural networks (CNNs) have made pattern recognition in images feasible. When considering an image, the proximity of the pixels has a strong correlation, and the CNNs specifically use this relationship. Nearby pixels have a higher probability of being related than pixels further apart. Convolution solves the problem of the need for a high number of connections that occurs in conventional ANNs, making it possible to process images by filtering connections by proximity. In a given layer, instead of connecting each input to each neuron, CNNs restrict the connections intentionally so that any neuron accepts inputs only from a small part of the previous layer (such as 5 × 5 or 3 × 3 pixels).
The evolution of graphics processing units (GPUs) and the availability of libraries, such as TensorFlow (TF) [5] and Keras [6], have substantially facilitated the development of CNNs. Despite the high accuracy that DL-based models can achieve, the problem associated with the level of confidence that the user has in the prediction of the network still remains. This can be solved through the use of probabilistic models (PMs), as they allow the expression of epistemic uncertainty in the predictions generated by the models. These models are computationally demanding and have only recently become feasible with the development of variational inference (VI) [7]. The creation of probabilistic CNNs became simpler through the emergence of models based on the estimators, such as Flipout, and their inclusion in TF probability [8].
Pre-trained deterministic models (DMs) with conventional architectures are available with optimization performed on large data sets, allowing transfer techniques to be used to achieve high accuracy. However, PMs do not yet have this transfer option because, typically, the distributions that these models learn are strongly associated with the data used to train the models. Thus, it is usually necessary to fully train the models on the train data sets, whose distributions should be learned.
One of the biggest barriers to the use of DL models by inexperienced users is the complexity associated with programming the components. Actually, the need for tools that can help the development of ML models has been identified since the beginning of this field. However, more recently, the Classification Lerner Application from Matlab [9] employed an innovative approach to address this need. It uses a simple button-based interface with short messages to guide the user, allowing any user to develop conventional ML models without requiring a deep understanding of all concepts underlying the models, having only to know how to use the developed model. This no-code approach allows for accelerating the use of these models in diverse areas of knowledge. With the large interest that ML has attracted in recent years, some newer platforms aimed at addressing this need for simple-to-use platforms for ML development. An example of such a platform is the Orange Data Mining application [10], which is free and developed in Python. It is also based on a simple button-based interface with small messages to guide the user.
By analyzing the state-of-the-art of GUIs for developing ML models, a gap was found concerning the PMs, especially with probabilistic CNNs. This gap is of particular relevance since these models require high knowledge in the areas of statistics and DL. Additionally, there is still little documentation on the procedures necessary for the development of these models. This restricts the mass use of these models, being only accessible to specialists with knowledge in the area. Considering the great interest that DL based on probabilistic CNNs has had recently, especially by programmers interested in making applications for pattern recognition, it becomes necessary to obtain a simple, user-friendly, and easy-to-use alternative that allows the creation of the models. Therefore, the main objective of this work was the development of a graphical user interface (GUI) that allows any user without previous knowledge in the area of DL, giving the capability to develop PMs based on CNNs with a no-code approach that follows Matlab's Classification Learner Application concept. Two processes were developed; one containing models with standard architectures in the DL area, and another where the user can create a model layer by layer with the desired architecture, with the possibility of selecting the desired number and type of layers and their characteristics.
This work has the following organization: presentation of the materials and methods in Section 2; performance assessment of the developed algorithms in Section 3; discussion of the results in Section 4; and conclusions of the work in Section 5.

Materials and Methods
The developed GUI presents multiple options that guide the user in the process of selecting the architecture of the ANN to be developed, defining the training procedure, and obtaining the resulting model in a simple and intuitive manner. However, it also allows the inexperienced user to train models with a small number of choices using the pre-defined recommended values for the options. It is only necessary to load the data sets, select the desired model, and start training it. Essentially, the developed GUI was designed to allow the development of a probabilistic CNN to be done in three steps: insertion of the desired parameters for training and choice of the architecture; training of the model; and visualization of the results.
When developing CNNs, one of the first issues that occur is choosing the architecture to be used. This defines the types and sequence of layers, the parameters, the number of inputs, the number of outputs, and the activation functions (AFs). All these options are available in the developed GUI, and the user can either choose the recommended (default) options or can select the available alternatives. If needed, the user can develop new modules for the GUI or edit the existing ones.
The entire code was developed in Python 3, including the GUI and the development of the models, and is available (in open source) in the public repository: (https://github.com/ achaves37/Interface-Grafica-para-desenvolvimento-de-Redes-Neuronais-Convolucionais-Probabiliticas/, accessed on 2 February 2023). Two versions are provided, one called "TestInterface" and another called "Interface". In the latter, the user will need to load his data sets, which are intended to be used for developing the custom models. In contrast, the first uses standard ML datasets (useful for testing proposed architectures and getting familiarized with the interface). All the code is documented and organized to make it possible to add/edit some models in a simple and direct way. It is recommended that users be familiar with basic operations with a Python console [11]. The user and the program interact through Tkinter [12] and IPython. The PMs were also developed in Python, using the TF libraries. However, the computational performance is not affected by the choice of this language because TF internally converts the models described for the compute unified device architecture (CUDA) [13], C++, and is optimized for GPU use.

Convolutional Neural Networks
CNNs are usually developed to be deterministic and are associated with pattern recognition in images or signals, presenting a high efficiency in classification. The process of data classification by the networks is carried out by means of the several layers that compose them [14]. These layers are typically of three types: convolution, pooling, and dense. A data alteration by ANNs takes place in all these layers, usually having non-linear AFs, such as the rectified linear unit (ReLU), to increase the learning capacity. The input and output layers are the ones that interact with the outside (they are not hidden).
CNNs use filters that consist of randomly initialized weights, subsequently updated at each new input, through the back-propagation algorithm to learn the patterns. The region of the input, where the filter is applied, is called the receptive field. Pooling is used to reduce the information from the previous layer, decreasing the number of weights to be learned and avoiding overfitting [15]. As in convolution, an area unit is chosen to traverse the entire output of the previous layer. Furthermore, it is necessary to choose how the filtering will be done. The most used is the choice of the maximum value (maxpooling), where only the largest number of the unit is passed to the output.

Probabilistic Models
One of the main advantages of PMs over DMs is the ability to present the epistemic uncertainty of predictions. However, the development of these PMs is typically more complex, requiring additional knowledge on how to build them, frequently using VI approximation [16]. It allows approximating probabilistic inference as an optimization problem, searching for an alternative a posteriori distribution that minimizes the Kullback-Leibler divergence (KLD) [17] with the true a posteriori distribution. In this way, all the weights W of the network are replaced by a probabilistic distribution, and the distribution is optimized during network training. In this case, the Gaussian distribution [18] is often used to represent these distributions.
Specifically, VI involves selecting a member q from a family of distributions Q, which is similar to the target distribution using KLD to measure the fit. In essence, the idea behind variational inference is to reframe our approximate Bayesian inference problem as an optimization problem. This reframing enables the leverage of the full range of mathematical optimization techniques to tackle the problem. By optimizing the parameters, denoted as θ, for q, the solution that is closest to the conditional interest P(W|D) can be found using KLD, where D is the training data. The optimal solution can be obtained by solving [16] q Since P(W|D) is unknown, a different objective needs to be optimized in order to minimize the KLD. The standard approach is to use the evidence lower bound (ELBO), which can be calculated by [16] − The cost function can be seen as a composition of two parts. The first part of the function is typically viewed as the complexity cost, which leads to the search for solutions whose densities are close to the prior. The second part is dependent on the data; it is usually referred to as the likelihood cost because it describes the probability of the data given to the model, leading to solutions that better explain the observed data. As a result, this cost function interprets a compromise between satisfying the simplicity of the prior probability distribution P(W) and the complexity of the data.
Although this approximation can be achieved through VI, using a prior distribution of the model weights to produce a regularization effect, the exact calculation of the ELBO as presented is computationally prohibitive. Another relevant aspect is the usual approach of using a Gaussian distribution for the model weights, with two parameters, the mean and the standard deviation, as it facilitates implementation. An approximation of the ELBO is required to enable the optimization of the network with gradient descent methods based on this loss function. This approximation is given by [16] To decrease the time required to train the networks, the Flipout estimator [19] is used to minimize the ELBO. Therefore, the output of each layer for all examples in the subgroup X, considering the activation function φ and the average weightsW, is given by [19] Since A and B are independently sampled from ∆W andW, it is possible to backpropagate with respect to X, ∆W, andW, optimizing, in this way, the network parameters. In short, VI directly approximates the posterior distribution with a simpler distribution, avoiding the calculation of the likelihood of all occurrences [20].

Graphical Interface
The developed GUI is shown in Figure 1. This is made up of five tabs: • The first tab, called "Start" gives a brief summary of how the application works in order to use the different data sets. • On the second tab, denoted as "Insert|Run", the user can edit the parameters to develop the desired model and then train the model. • The third tab, named "Results", allows the user to visualize the results. • The fourth tab, designated as "FAQ", allows the user to access tutorial videos (available at https://www.youtube.com/channel/UC9bZjefkicHKC6VJPGLLZJA, accessed on 2 February 2023) and verify the versions, and check if the required libraries for the GUI to work are installed. • Lastly, the fifth tab, called "About", displays a message about the authorship of the development of the GUI. In Figure 2, the "Start" tab is presented, where GUI provides information regarding its use for all datasets. The GUI obtains data from four comma-separated value (CSV) files (stored in specific folders such as DataSet1d or DataSet2d for one-or two-dimensional classification data, respectively), using one for training data, one for training labels, one for testing data, and one for testing labels. The data for training and testing must be organized in rows, where each row represents a sample, and each column denotes a feature of the sample. If two-dimensional data are being used, they must be flattened, and the program will internally recreate the original shape to feed the models. Additionally, the labels file should have one row for each sample in the data file (matching the raw number in the files) and be encoded in decimal or one hot encoding. The user can carry out preprocessing of the data, such as normalization or standardization, and the result can be stored in the data CSV files. All generated results will also be stored in folders according to the used model, as shown in Figure 3.  The subsequent tab called "Insert|Run" is shown in Figure 4. Here, the user can introduce the parameters that will control the models' training. If necessary, the user can compare with the recommended values. These values are already pre-filled, and if the user wants to use those values, he only has to click on the "Insert|Save" button. Otherwise, the user must insert the values and then press the button. By pressing the button, the confirmation message is shown on the IPython console (for visual confirmation of the parameters previously inserted), as shown in the example of Figure 5. The left side of the Figure 5 is for classification using a one-dimensional data set, and the right side is for a regression two-dimensional data set using. The user can also perform regression with one-dimensional data and classification with two-dimensional data. If needed to expand to higher dimensions, the user can add new modules (or edit an existing one) to the GUI, copying the code from the two-dimensional models and changing the number of dimensions to the desired value.   Figure 6 shows the tab named "Insert|Run". Here, the user can select which standard model should be trained, or, if preferred, create a model (create layer by layer). If needed, the user can add new standard architectures (or edit the existing ones) to the GUI, requiring only adding the structure of the model to a predefined file. The default optimizer is the ADAM algorithm. However, the user can change to another optimizer if needed. Additionally, accuracy was the tracked metric for classification while the mean squared error was the examined metric for regression. The training process is already prepared to perform early stopping if a minimum improvement is not attained in a sequence of epochs defined by the user (if the user does not pretend to use early stopping, then it just needs to specify the patience value to be higher than the number of epochs intended to be executed). After pressing the "Train Model" button, the program starts training the selected model. Figure 7 shows the console indicating the progress in training the model. Furthermore, the user can create the model by inserting it layer by layer, having to select the layer type; in this case, convolution, pooling, or dense. The insertion window is shown in Figure 8.  In the "Results" menu, shown in Figure 9, the results of the models' training are presented. The user is required to select which model should be examined. Pressing "See all the results in a plot" displays a single plot with all performance metrics. Furthermore, pressing "See each result in a plot" displays one plot with the chosen performance metric to be plotted. Pressing "Print each result in IPython" displays the values in the IPython console. The tab denoted "FAQ", represented in Figure 10, shows the user information about libraries that should be installed and checks if they are present in the system. Lastly, the "About" tab, represented by Figure 11, shows the user information about GUI development.

Interface Development
The developed GUI was developed in three iterations by having first meetings with machine learning specialists to highlight which requisites they consider to be essential for the tool. A quality attribute workshop was then carried out by five specialists to vote on which requisites they consider relevant. The identified requisites, their attribute, and the number of votes were as follows: • The GUI should allow the examination of 1D and 2D data while keeping the option to easily integrate higher-dimensional data. Usability attribute voted as relevant by three specialists. The GUI execution should be independent of the operating system. Interoperability attribute voted as relevant by two specialists. • The GUI should provide feedback to the user during the training procedure. Availability attribute voted as relevant by two specialists. • The GUI should provide the user with a list of all parameters that are going to be used to train the model. Usability attribute voted as relevant by two specialists.
From the carried-out workshop, it was concluded that the main concerns focused on usability issues related to the graphical interface, i.e., how easy it is for the user to perform tasks. Therefore, the graphical user interface was developed to benefit the usability quality attribute the most.
The second iteration was a discussion with GUI specialists on how to properly present all functionalities while using a simple functionality-oriented interface. After defining the desired entity-relationship model and the use case diagram, a navigation map was proposed with six main scenarios as depicted in Figure 12. These scenarios were as follows: • Scenario 1: represents the beginning of the GUI; information is displayed to guide the user in the process of using the GUI.  Low-and high-fidelity prototypes were made with the specialist during this second iteration. Figure 13 shows all created prototypes, depicting low-fidelity on the top and high-fidelity on the bottom. Usability tests were carried out in the last iteration, which was performed on 13 subjects (ten males and two females) with an age range of 25 to 36 years old (average of 29.4). All subjects had used machine learning before, with eight subjects reporting having substantial experience, while the remaining five reported having minimal to some experience. Several tasks were created to cover all potential usage scenarios, each requiring the selection of a model, training, and analysis of results. Once the tasks were completed, the system usability scale [21] was utilized to assess the system's performance. The outcomes showed an average score of 4.4 out of 5, equivalent to 88%, which falls under the "Excellent" category.
During the test, the users identified two main flaws in the high-fidelity prototypes. The first one was the lack of a button to go back if, by mistake, they chose the wrong model (it required the user to restart the GUI to do so); hence, a button to add this functionality was included. The second one was the lack of relevant information presented in the "FAQ" tab. This problem was addressed by adding a button that leads the user to video tutorials and by including more information regarding the versions of the sued libraries.

Results
The GUI was tested for performing classification and regression tasks with one-and two-dimensional data. The first example to illustrate the GUI procedures is composed of a model for two-dimensional classification. Specifically, image data of hand-drawn numbers (MNIST dataset) were used (available at http://yann.lecun.com/exdb/mnist, accessed on 2 February 2023). For this purpose, the standard LeNet-5 model (available at the GUI), with the default GUI options, was used as presented in Figure 14. It is important to highlight that the purpose of the next examples is to show the use of the developed GUI and not the performance of the models used here as examples.    On the left side of Figure 17, the epistemic uncertainty of the trained model is presented. In this example, one of the points where the uncertainty is higher is chosen, and on the right side, a zoom is performed to identify the sample number. After attaining the sample number, it can then be further analyzed as presented in Figure 18. Pressing the "Check" button the sample is presented. Through IPython, it is possible to check into which class the model classified the sample, what was its true class, and the result of the model classification. It is possible to observe that the model classified the sample number 2927 as class 2. Its true class was 3, that is, the model failed to classify the sample correctly. Finally, in Figure 19, the epistemic uncertainty distributed over the various classes is shown. In the "Results" tab, pressing "See all the results in a plot" displays a single plot in a new window with the most relevant classification performance metrics: accuracy (Acc); area under the receiver operating characteristic curve (AUC); negative predictive value (NPV); positive predictive value (PPV); sensitivity (Sen); specificity (Spe).
The obtained results are shown on the left in Figure 20. The user can select to view only one performance metric to ease the interpretation, as depicted on the right side of Figure 21, where only Acc is shown. In the same tab, pressing "Print each result in IPython" prints the values of the chosen result on the IPython console. All trained models and results are saved in a local folder, allowing the user to retrieve them for further use on other applications. The user can also resume training on a previously trained model and can perform transfer learning by loading a previously trained model and then retraining it on data from another problem.
The subsequent examination was performed on a regression test, where the goal was to determine the house prices in a Boston suburb or town using a standard dataset (Boston housing dataset, available at https://www.cs.toronto.edu/~delve/data/boston/ bostonDetail.html, accessed on 2 February 2023), using 14 features to feed the model. For this purpose, the LeNet-4 architecture was built using the custom model creation option (layer-by-layer specification).
The training performance is presented to the user as shown in the example of Figure 22. Furthermore, the test data predictions are also presented to the user. The PM also allows presenting the 95% confidence interval, as depicted in Figure 23, allowing the user to clearly identify where the model is confident enough on its forecasts.

Conclusions
This work presents a graphical user interface for the development of probabilistic CNNs. This type of model can express epistemic uncertainty, a characteristic that is of great importance since it indicates the level of confidence the model has in its predictions. For instance, in medical fields where the outcome of the decision can affect the diagnosis, knowing the confidence level in the forecast is critical. Through the developed tool, it is possible to build PMs in a no-code approach without requiring in-depth knowledge in the domain of DL with probabilistic models, potentially speeding the development of applications where PMs are necessary. To the best of the author's knowledge, no previous work has provided an interface for PMs, not even in the most popular platforms, such as Matlab Classification Learner (which conceptually is the closest to this work), Orange Data Mining, Rapid Miner, or Dataiku, making a direct comparison unfeasible. Furthermore, the developed tool employs a GUI that was shown, in usability tests, to be easy to use and capable of addressing the needs of users who want to develop probabilistic CNNs in a simple and straightforward way.
The implemented approach allows the user to either use standard architectures (already adapted for implementing a PM) for the model or to create a custom model, having the three most common layers (convolution, pooling, and dense) in DL available. Further-more, the tool is fully adaptable and, if needed, the user can add or edit elements to it, including new standard models, performance metrics, and visualization methods.
The proposed tool was designed to allow the user to develop and test the models, requiring only to load the data (or use the standard datasets available in the tool's database), select or specify the architecture, initiate the training, and then check the test results. The support documentation instructs the inexperienced user on how to use the tool, and the trained models are saved in a standard format, allowing the user to further use them on other applications. The following main paths are proposed as future steps in this research: the first is to implement a web-based version of the tool, allowing the back end to be run on a server while the front end can be executed on a browser; and the second path is to further expand the data visualization resources, allowing the user to carry out exploratory data analysis before starting the model development. Regarding the subsequent improvements, it is intended to add full support for cross validation. It can be done with the current version, but it requires the user to prepare the datasets outside of the developed tool.