Application of Deep Learning Techniques and Bayesian Optimization with Tree Parzen Estimator in the Classiﬁcation of Supply Chain Pricing Datasets of Health Medications

: From the development and sale of a product through its delivery to the end customer, the supply chain encompasses a network of suppliers, transporters, warehouses, distribution centers, shipping lines, and logistics service providers all working together. Lead times, bottlenecks, cash ﬂow, data management, risk exposure, traceability, conformity, quality assurance, ﬂaws, and language barriers are some of the difﬁculties that supply chain management faces. In this paper, deep learning techniques such as Long Short-Term Memory (LSTM) and One Dimensional Convolutional Neural Network (1D-CNN) were adopted and applied to classify supply chain pricing datasets of health medications. Then, Bayesian optimization using the tree parzen estimator and All K Nearest Neighbor (AllkNN) was used to establish the suitable model hyper-parameters of both LSTM and 1D-CNN to enhance the classiﬁcation model. Repeated ﬁve-fold cross-validation is applied to the developed models to predict the accuracy of the models. The study showed that the combination of 1D-CNN, AllkNN, and Bayesian optimization (1D-CNN+AllKNN+BO) outperforms other approaches employed in this study. The accuracy of the combination of 1D-CNN, AllkNN, and Bayesian optimization (1D-CNN+AllKNN+BO) from one-fold to 10-fold, produced the highest range between 61.2836% and 63.3267%, among other models.


Introduction
There are different diseases affecting public health and safety across different parts of the world. Much research has centered on proffering solutions by diagnosing and truncating the transmission of diseases such as COVID-19 (henceforth, SARS-CoV-2) [1,2], HIV [3,4], Ebola [5][6][7][8], malaria [9][10][11][12][13][14][15][16][17][18][19][20], hereditary diseases [21,22], monkeypox [23][24][25], tuberculosis [26][27][28][29][30], and other diseases [31][32][33][34][35], by adopting different computational, modeling and bioinformatics approaches. Recently, the World Health Organization (WHO) declared monkeypox disease a case of a global public health emergency of International Concern [36]. Health medications such as vaccines, and liquid and solid medications are very essential and in high demand to combat the diseases [37][38][39][40]. After the production of these medications, one of the major problems is how to transport and distribute these health medications effectively and efficiently to regions where they are urgently needed. Health medications form an essential part of human society. However, the supply chain process that health medication undergoes before they are safely delivered to their various for a variety of facts for the forecast to be correct. Machine learning presents a range of solutions to the issue of inadequate data for research to be successful. One such method is data augmentation, which enables you to greatly enhance the variety of data provided for training models without having to collect additional data. Depending on the type of data, several augmentation strategies are utilized in deep learning applications [56,57]. Approaches like Synthetic Minority Over-sampling TEchnique (SMOTE) or Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE NC) are frequently used to enhance plain numerical data [58,59]. Depending on the nature of the project, augmentation techniques for large amounts of data, such as images and text, range from straightforward manipulations to neural network-generated data. A machine learning technique called incremental learning trains a model using a small quantity of data [60]. Rather, learning begins with a rather basic model that generally forecasts the estimated value with a certain variability. The model is trained to be able to predict outcomes more accurately when a data scientist inserts additional data instances. The number of datasets will eventually be sufficient to generate accurate predictions.
Along with supervised learning and unsupervised learning, reinforcement learning (RL) is one of the three fundamental machine learning methods. It employs rewards and penalties as cues for appropriate and inappropriate conduct [61]. RL is employed in robotic systems and process control to allow the robot to develop an effective, adaptable control system for itself that gains knowledge from its own experience and behavior [62]. When it comes to data, the decision of whether to employ a data lake or a data warehouse emerges. Data lakes are frequently employed in advanced analytics or machine learning applications. They are frequently employed in ML projects because they enable the real-time collection and storage of data from numerous sources. A data warehouse is appropriate for operational processes and daily operations, whereas a data lake is ideal for those who require a thorough study of broad-spectrum data that have been acquired over time. Nevertheless, a lot of businesses are increasingly utilizing both types of storage, particularly when a data lake serves as the foundation for a data warehouse that leverages sanitized and structured data from a DL [63].
Another type of supply chain ML application is computer vision (CV) for inventory control. It is often used in many different settings. It is first used to categorize and tally freshly delivered items. CV also helps in the identification of visible package damage. With the use of computer vision, the program can classify the items it "sees." Robots equipped with cameras, for example, may scan your storage spaces and instantly produce a picture of your goods. Machine learning techniques that may be applied in the CV sector include supervised learning, unsupervised learning, and reinforcement learning [64].
Predictive maintenance of equipment is another common application of machine learning in the supply chain. Based on real-time asset data instead of a predetermined timetable, ML ensures reactive and preventative maintenance of equipment. Supply chain experts can drastically reduce maintenance costs by improving asset upkeep. Additionally, ML aids in the decline of no-fault-found (NFF) situations. When a unit is deemed to be defective, it is taken out of service and designated as NFF. If no abnormality is found, the device is put back into use without any repairs. The production process becomes more efficient as the number of such accidents decreases [65]. ML aids in determining a package's location throughout the logistics process. It enables supply chain experts to monitor the whereabouts of cargo while it is being transported. Additionally, it gives insight into the circumstances of the package's transportation. Retailers can keep an eye on variables like temperature, vibration, humidity, etc. with the use of sensors. Additionally, ML supports in-the-moment route optimization. It keeps track of the weather and the state of the roads and makes suggestions on how to shorten travel time and optimize the route. This allows for the diversion of trucks at any moment when a more economical route is available [66].
Machine learning is utilized in warehouses to automate manual tasks, foresee potential problems, and minimize paperwork for warehouse workers. For instance, computer vision enables the management of conveyor belt operations and the forecasting of blockages.
Thanks to NLP and OCR [67], warehouse personnel can automatically recognize goods' arrival and modify their delivery statuses. On the product, barcodes and inscriptions are scanned by cameras, and the data are immediately input into the system. Additionally, machine learning assists in programming robots and autonomous vehicles, both of which are commonly utilized in warehouses. Autonomous vehicles and robots assist with receiving, packing and unpacking, transporting, and uploading and unloading boxes with the use of system-integrated instructions. In this scenario, computer vision aids in locating a vacant space for a box, monitoring its proper placement, and preventing robot and automobile accidents in warehouses [68]. This paper is an innovative work because it deals with the hybridization of deep learning and Bayesian optimization with a tree parzen estimator for the classification of the supply chain of health medication datasets. To the best of our knowledge, there is no study in the literature that has adopted such a technique. The major contributions of this work include: i.
A survey of machine learning and deep learning algorithms that have been applied for supply chain management was presented. ii.
Determination of the most appropriate deep learning model for the classification of supply chain health medications was done. iii.
Development of deep learning and Bayesian optimization Techniques with Tree Parzen Estimator for classification of supply chain health medications was done. iv.
Evaluation of the performance of the proposed methods for classifying health medications using different metrics was achieved.
The rest of the paper is organized as follows: a review of related work together with the summary of contributions table is presented in Section 2. Whereas Section 3 explains the methods used for the classification of supply chain health medications. The simulation and statistical results of our experiments are presented in Section 4. The conclusion of the paper is in Section 5.

Related Works
In the area of supply chain management, some study has been done. Wong et al. [69] examine the technological design while considering the technical viability in terms of scalability, large data processing, and analytics. For data transfer and communication between members of the community, blockchain technology establishes a network that is mainly dependable. Supply chain management using blockchain technology is susceptible to efficiency and memory issues because of the irreversibility, increasing volume, and heterogeneity of supply chain transaction data on the blockchain peer-to-peer network of different supply chain participants. To uphold the norms of supply chain management using blockchain technology, the blockchain architecture backed by a cutting-edge cloud infrastructure was proposed. The cloud platform supports all required web services with scalability, accessibility, protection, and virtualized computing features that make it easier to distribute, share, and store a sizable amount of immutable transaction records with the help of privacy and confidentiality web services, including the use of status update web services. The limitation of the work is that further quantitative and empirical investigations into the supply chain management performance model are required.
Alnahhal, Ahrens, and Salah [70] examined the dynamic lead-time forecasting that may be done by a logistics firm to optimize temporal cargo consolidation. Consolidating shipments is frequently done to cut the cost of export, but it can lengthen the delivery time. In the study, forecasting is done using real data in a make-to-order supply chain where the logistics provider is unaware of the producers' own data records. Using machine learning techniques like logistic regression and linear regression, forecasting was carried out in stages. The final stage verifies whether the order will arrive during the next delivery week or not. After each cargo delivery, forecasting is reviewed to see whether it would be capable of delivering the present inbound purchases for a specific customer promptly or if it would wait until the next week. The outcomes demonstrated acceptable accuracy expressed in many ways, one of which is dependent on a type I error with an average value of 0.07. Adaptive forecasting is used in the work in order to optimize cargo temporal integration in the aggregation center. The limitations of the study include a lengthy lead time and the assumption that it will take a few weeks. Furthermore, forecasting by the third-party delivery company is unnecessary if suppliers offer correct information about delivery schedules. Suppliers can offer better forecasting because they typically have access to more local knowledge. Temporal consolidation would not be desirable if stock level costs are too high and transportation costs are low.
A novel medicine supply chain and recommendation system built on blockchain, and machine learning was developed by Abbas et al. [71] and deployed (DSCMR). A consumer medicine recommendation system based on machine learning and a system for managing the drug supply chain based on blockchain are the two main parts of our suggested strategy. The first module installs the medication supply chain management system using open blockchain fabrics. This system can continuously track and monitor the drug distribution process in the intelligent pharmaceutical business. In contrast, the N-gram and LightGBM models are employed in the machine learning module to provide clients in the pharmaceutical business with recommendations for the best or highest-rated medications. These models were trained using the well-known drug review dataset, which the University of California made available to the public as part of a fully accessible machine learning collection. Additionally, this blockchain system incorporates the machine learning module with the aid of the REST API. Finally, they run a number of tests to evaluate the effectiveness and usefulness of our proposed solution. The system's limitations are that the network is not very large, and the technology is not used in real-time by pharmaceutical companies to evaluate its effectiveness. Additionally, the accuracy of the machine learning models used in the work is subpar.
Shahbazi and Byun [72] provided a method for manipulating perishable food that integrates the most recent advances in blockchain technology, machine learning technology (ML), and fuzzy logic traceability systems. This technology is known as the blockchain machine learning-based food traceability system (BMLFTS). The proposed system's blockchain technology was created to address issues such as weight, evaporation, warehousing transactions, and shipment times. The blockchain data flow is intended to demonstrate how machine learning was extended to the level of food traceability. Additionally, to extend shelf life, supply chains use precise and reliable data. The proposed solution has limitations dependent on the use of other supply chain applications in the food traceability scenario. The proposed system should not be limited to food traceability; additional analysis features, such as risk management and e-commerce transactions, can be included. These are the two primary aspects that are recommended for further study and refining of the proposed technique. Similar information flows, including risk, material, and value flows, can be addressed by an incorporated strategy to build a supply chain that is more dependable and safer.
Tirkolaee et al. [73] give a general overview of how ML approaches are being used across the supply chain. They named machine learning (ML) applications in supply chain management (SCM) as one of the most well-known artificial intelligence (AI) methodologies. The importance of machine learning (ML) approaches in supplier selection and segmentation, supply chain risk prediction, demand and sales estimation, inventory control, transport and logistics, environmental sustainability (SD), and digital economy (CE) is highlighted in this study. The study's implications for the major issues and shortcomings are then examined, followed by managerial advice and suggestions for future research. The authors predicted advancements in AI research and proposed more investigation into the use of RL approaches in real-time pricing. The study's main flaw is that it only looked into a small number of algorithms, leaving out many other highly effective deep learning and machine learning methods.
Manasas [74] explains how lead time prediction and minimization using machine learning techniques can help with supply chain management. Lead time has been ex-tensively studied since it is seen as a crucial aspect in both supply chain planning and customer satisfaction. These techniques were used on a sizable set of Greek-headquartered enterprises that manufacture aluminum in multiple stages and with multiple products. The lead times for the company's main product groups, architectural aluminum profiles, and accessories, were explored in depth using two predictive models. A third model was used to accurately forecast for aluminum accessories to avoid stock-outs, which have a significant impact on the lead times for orders placed. The case study's outcomes seem more than acceptable, outperforming the effectiveness of the technologies in use for demand forecasting and lead time prediction. The work's limitations include inaccurate lead time estimation; waiting periods between phases should be considered as this "dead time" influences the overall cycle time. In addition, the industrial environment is undergoing dynamic changes. Substantial changes in the system, such as the adoption of new planning methods and systems, are not easily accomplished by the algorithms in the absence of appropriate data, despite the fact that machine learning algorithms are good at recognizing the environment's modifications when being retrained.
Wong et al. [75] used artificial intelligence (AI) to show how supply chains (SCs) can respond dynamically to unstable conditions, reducing the need for small-to-medium-sized businesses (SMEs) to make potentially costly decisions. This work explores the effect of AI on SC risk management for SMEs, building on a resource-based perspective. Based on information gathered from executives, managers, and senior managers of SMEs, a structural model consisting of AI-risk management capabilities, SC reengineering prowess, and supply chain agility (SCA) was created and evaluated. Artificial neural networks (ANNs) and partial least squares-based structural equation modeling (PLS-SEM) are the key methodologies used in this study (ANN). According to the findings, SC re-engineering flexibility and capacities are influenced by the application of AI for risk management. Efficiency is further impacted and mediated by reengineering abilities. Reliability was found for models A and B when PLS-SEM and ANN were compared. The SC's current levels of demand uncertainty put managers under pressure to make difficult trade-off decisions in a short amount of time. AI makes it feasible to model different situations to provide answers to important issues that outdated infrastructures cannot. In this work, non-linear correlations in the model were discovered and a multi-construct adaptability idea was combined. The limitation of the study is that other factors affecting the acceptance of technology, like culture, managerial dedication, and technological innovation, were not taken into account by the model.
Keerthana [76] applied different machine learning techniques for supply chain management. To close the supply-demand imbalance, the researcher created a complete system that uses an expandable deep neural network architecture. Depending on transaction records from already processed transactions, the architecture is able to examine a number of customized input items and proactively identify supply and demand trends. Based on a set of adaptable features, a general training model is created to forecast future demand, which is then put to the test. Integrating layers are applied to transfer high-dimensional features onto a small subdomain, resulting in a more compact representation, to bring together incoming data. The training framework of the model is made up of fully joined layers with connected activation functions. The limitation of the work is the low performance of the machine learning models used.
Abou Zwaida, Pham, and Beauregard [77] presented a study on a drug refilling optimization problem, a general model for drug inventory management in a hospital. To solve the optimization challenge of a drug shortage, they investigated the hospital's drug supply chain model and developed Dynamic Refilling dRug Optimization (DR2O). They simulate an objective function that seeks to reduce the restocking costs, which include the price of the drug itself, the cost of storing it, and the consequence of a deficit. They also took budgetary limitations and supply limits into account while refilling medications, such as storage capability. They also proposed the Deep Reinforcement Learning Model for Drug Inventory (DRLD), a deep learning approach based on RL and DNN in which the pharmaceutical problem is represented as a state in a Markov Decision Process (MDP). They searched for a proper measure to decide whether to supply based on each state in order to reduce the objective cost function. They created an online approach to system control based on the MDP model, where reward and Q-matrices are established to assess an action match for each state. They presented a DNN model that can approximate the Q-values after training and can understand the behavior of the system because of the large search state space in RL. Lastly, they looked at using a detailed simulation to carry out their work. They specifically compared their strategy to three baseline strategies, which included support functionality, ski rental, and max-min. In most analyses, their approach outperforms other methods considered in the paper, particularly in terms of lowering the cost of restocking and the scarcity issue. The limitation of the work is the relatively high rate of unexpected results.
Milani et al. [78] worked on the forecasting of supply chain management for noncommunicable diseases. The authors examine the applicability of forecasting modeling to non-communicable diseases (NCDs) in supply chain management, including the suitable techniques to gather and analyze the crucial data. In healthcare, a different approach has been put in place to forecast both vertical and horizontal supply chain interactions utilizing numerical forecasting models and machine learning. To foresee harmful medical events, the paper suggests various types of data gathering, analysis, and prediction methodologies. These methods may prove valuable in the healthcare supply chain, which includes manufacturers, wholesalers, marketers, and providers. The shortcoming of the work is that the performance of the machine learning and deep learning models used in the work was not reported.
Liotine [79] reviews the outcomes of the findings of a panel study conducted by the industry to determine how supply chain control tower (CT) deployments for the pharmaceutical business are affected by new autonomous intelligence technologies, such as artificial intelligence and machine learning. Such technologies have the capability to transform CTs into a model that allows for the collection, assessment, and decision-making of data in real time. This can be done by utilizing these technologies to handle decision intricacy better and carry out decisions at rates that people would normally find difficult to handle. The essential skills that must be enabled and the increased level of decision accessibility they offer are some of the main elements that have been recognized. They also considered some of the obstacles involved in accomplishing this, such as data quality and reliability, collaboration and data sharing across supply chain layers, system-to-system compatibility, decision-validation, and administrative effects, among several others. The drawback of the work is that no machine learning algorithm was implemented in the paper.
Shah [80] explains the essential components of every supply chain and the many tactics a drug manufacturer can employ to operate effectively. Artificial intelligence (AI) and machine learning (ML) tools are some of the more contemporary techniques that businesses utilize for supply chain optimization. Lead times are shortened because of these innovative technologies, which also assist in forecasting better routes in the future and drastically save expenses. The paper then examines paracetamol, a medicine that is extensively produced in India as an active pharmaceutical ingredient (API). The article offers suggestions for strategies to enhance the supply chain for this specific API in India by assessing its existing manufacturing across the supply chain process. Overall, increasing organizational profitability may be accomplished by strengthening India's supply chain management through the creation of cogent plans and the application of data-driven decision-making. The limitation of the work is that it is not scientific enough as there is no practical implementation of any AI-based algorithms for supply chain management in the paper.

Supply Chain Shipment Price Dataset
Supply chain management refers to the transportation of goods and services and includes all processes that convert raw materials into completed products. It comprises aggressively streamlining a company's supply-side procedures to maximize customer value and gain a market advantage. The supply chain shipping pricing dataset used in this investigation was collected from Kaggle [81]. The supply chain shipping price dataset, which was utilized as the target variable, has four modes of transportation, which are classified as Air, Truck, Air Charter, and Ocean mode. Aircraft are used to deliver commodities in air freight or air charter. The data sample sizes are 10,324.
The quickest mode of transportation without a doubt is air travel. Being the most practical form of transportation and not having to deal with many natural obstacles makes it quite advantageous. Due to this advantage, regardless of a location's geographic challenge, it is ultimately the most accessible. Except for exceptionally hefty products that would not fit inside the aircraft, many objects can be transported using airfreight service. Also known as the best method of shipping perishable goods, this technique of transportation. Air travel is without a doubt the quickest means of transportation. It is incredibly beneficial since it is the most convenient mode of transportation and does not have to contend with numerous natural impediments. Because of this benefit, it is eventually the most accessible to all locations, regardless of geographical obstacles. Most things may be delivered via airfreight service, apart from particularly heavyweight items that may not fit within the aircraft. This means of transportation is also recognized as the finest mode of shipment for perishable commodities.
When compared to other types of transportation, air shipping is frequently the most expensive. Express shipping refers to things that are transported using air transport since the shipment pace is faster, the products are delivered sooner, and the procedure is somewhat more expensive. You may anticipate your items to arrive in 1 to 2 days if you choose air transport. One of the oldest modes of moving products is via land, for example, by truck. When it comes to shipping products within a country or across borders, this is the most practical option. Trucks are commonly employed to move things across highways because they have large cargo compartments that can accommodate heavier objects like building materials and even autos. This means of transportation is less expensive than the others. However, it is possible that the goods will take longer to arrive at their destination. Railways are another mode of land transportation. Because rail freight is less expensive and can move bigger commodities across the country, it provides a number of benefits. Shipping Through Sea (Ocean) is the term used to describe shipping by sea for a variety of uses, such as commercial or military. It is a technique for shipping comparatively bigger amounts of material using cargo ships in which the cargo is first loaded into a vessel before being packed into containers. Almost everything may be transported by water, however, if you want your product delivered fast, shipping products by sea is not advised. However, as input variables, we utilized line-item value, line-item insurance, and line-item quantity. Any Sales Order or Purchase Order that we may place for distinct items on the same Purchase Order or Sales Order is referred to as a Line Item. All the things requested are mentioned one after the other in the Sales/Purchase Order, whereas the Line of Insurance refers to the coverage provided under the specific policy being purchased.

Long Short-Term Memory (LSTM)
Deep learning is a sort of neural network with several layers, commonly referred to as deep structured learning. These networks do better than standard neural networks at remembering information from prior occurrences. One such device that employs many networks in a loop is a recurrent neural network (RNN). Thanks to networks that are in a closed loop, the information may be kept. Each network in the loop gets information and input from the preceding network, takes the required action, and produces output while sending the information to the next network. Only recent information is required Appl. Sci. 2022, 12, 10166 9 of 23 by certain applications, but more historical data may be desired by others. The common recurrent neural networks experience learning delays when the gap between the moment of necessity and the required prior information expands. However, a kind of RNN called Long Short-Term Memory (LSTM) Networks [82] can learn such occurrences. These networks have been created expressly to get over the recurrent networks' problem with long-term reliance. The ability to recall information over a long period of time is a strength of LSTMs.
The validity of the model may be impacted by extra previous information, hence LSTMs are an obvious option for usage. The four neural network layers that make up a standard LSTM module sometimes referred to as a repeating module, interact in a special way. The LSTM is represented mathematically as follows: where f t is the forget gate at time step t, i t is the input gate at time step t, c t is the cell gate at time step t, o t is the output gate at time step t, h t is the hidden state at time step t, h t−1 is the previous hidden state at time step t, o t is the output gate at time step t, α is the activation function, W f , W i , W c , W o are the weight of forget gate, input gate, cell gate, output gate respectively and b f , b i , b c , b o are bias of forget gate, input gate, cell gate, ouput gate.

One Dimensional Convolutional Neural Network (1D-CNN)
A convolutional neural network (CNN) [83] can mine data in depth because of its incredible feature extraction capabilities. A convolutional neural network (CNN) is a two-dimensional image processing technique. A convolution kernel glides across an image, extracting pixel information and allowing image classification and identification. A onedimensional convolutional neural network (1D-CNN) is a modified form of CNN. The convolution layers of 1D-CNN [84] have one-dimensional filters and a one-dimensional spectral input layer. 1D-CNN is made up of convolutional layers, max-pooling layers, and fully connected layers.
The mathematical representation of 1D-CNN is: where x is the input, β is the activation function, n represents the number of feature maps in the layer, W represents the trainable one-dimensional convolutional kernel, x i represent ith feature map, b i represent the bias of the ith feature map, o i is the output of the ith neuron.

Bayesian Optimization with Tree-Structured Parzen Estimators
The goal of hyper-parameter optimization in machine or deep learning is to determine the hyper-parameters of a particular machine or deep learning algorithm that provide the highest performance when tested against a validation set. A number of hyper-parameters influence the predictive accuracy of the models. It is critical to tune these hyper-parameters in a reasonable manner. However, unlike conventional parameter optimization, hyperparameter optimization is a combinatorial optimization issue that cannot be solved using the gradient descent approach. Furthermore, because every hyper-parameter alteration necessitates retraining to assess the effect, computation for evaluating a collection of hyper-parameter configurations is particularly demanding. Bayesian Optimization will be used in this work along with probabilistic regression models such as tree-structured parzen estimators. According to [85] the configuration space is limited to a tree-structured parzen estimator, which gives a straightforward approach for determining the model and determining local optimal hyper-parameter settings. Bayesian Optimization using a Tree-Structured Parzen Estimator is mathematically expressed as follows: where a is the set of objective vectors, l(b) is the probability density function, g(b) is the probability density function of the remaining observation, a min is current minimum loss, a(b) is the loss under the hyper-parameter setting b. EI is the expected improvement, b is the local optimal hyper-parameter setting.

Target Variable Class Imbalance
The above-mentioned target variable resulted in some class imbalance. For example, in Figure 1, there is a class imbalance in shipping methods, with air accounting for 53.5 percent, truck accounting for 34.1 percent, air charter accounting for 8.0 percent, and ocean accounting for 4.5 percent. To compensate for the imbalance, we employed AllKNN (All K Nearest Neighbor) by [44]. Because the number of neighbors of the internal closest neighbors algorithm is raised at each iteration, AllKNN differs from other under-sampling techniques. This strategy employs the closest neighbor's algorithm for the class based on its difficulty in learning. The methods have a comparable impact by removing noisy data around the class borders. This makes deep learning more successful at finding underrepresented classes. Figure 1 demonstrates that air medical transport has been a significant mode of transportation since it can move very quickly, which is quite useful in an emergency. They can deliver the medications and patients to the required location. This is because the airways are not backed up and can quickly go to the desired location. solved using the gradient descent approach. Furthermore, because every hyper-parameter alteration necessitates retraining to assess the effect, computation for evaluating a collection of hyper-parameter configurations is particularly demanding. Bayesian Optimization will be used in this work along with probabilistic regression models such as treestructured parzen estimators. According to [85] the configuration space is limited to a treestructured parzen estimator, which gives a straightforward approach for determining the model and determining local optimal hyper-parameter settings. Bayesian Optimization using a Tree-Structured Parzen Estimator is mathematically expressed as follows: where is the set of objective vectors, ( ) is the probability density function, ( ) is the probability density function of the remaining observation, is current minimum loss, ( ) is the loss under the hyper-parameter setting .
is the expected improvement, is the local optimal hyper-parameter setting.

Target Variable Class Imbalance
The above-mentioned target variable resulted in some class imbalance. For example, in Figure 1, there is a class imbalance in shipping methods, with air accounting for 53.5 percent, truck accounting for 34.1 percent, air charter accounting for 8.0 percent, and ocean accounting for 4.5 percent. To compensate for the imbalance, we employed AllKNN (All K Nearest Neighbor) by [44]. Because the number of neighbors of the internal closest neighbors algorithm is raised at each iteration, AllKNN differs from other under-sampling techniques. This strategy employs the closest neighbor's algorithm for the class based on its difficulty in learning. The methods have a comparable impact by removing noisy data around the class borders. This makes deep learning more successful at finding underrepresented classes. Figure 1 demonstrates that air medical transport has been a significant mode of transportation since it can move very quickly, which is quite useful in an emergency. They can deliver the medications and patients to the required location. This is because the airways are not backed up and can quickly go to the desired location.

Model Specification
We compared the performance of two widely used deep learning techniques. In this case, we contrasted a one-dimensional convolutional neural network with long short-term memory (LSTM) (1D-CNN). Deep learning or deep structured learning refers to neural network types with many layers, such as LSTM or 1D-CNN. When it comes to recalling

Model Specification
We compared the performance of two widely used deep learning techniques. In this case, we contrasted a one-dimensional convolutional neural network with long short-term memory (LSTM) (1D-CNN). Deep learning or deep structured learning refers to neural network types with many layers, such as LSTM or 1D-CNN. When it comes to recalling information from prior occurrences, these networks perform better than normal neural networks. The network's closed-loop design keeps the information safe. Every network in the loop gets input and data from the preceding network, performs the specified action, and then outputs data while sending them to the next network. While some applications just require the most recent data, others may call for more historical data. Such examples can be learned by Recurrent Neural Networks (RNNs), specifically Long Short-Term Memory (LSTM) Networks [86]. These networks are specially designed to circumvent the recurrent networks' issue with long-term dependency. Long-term memory systems (LSTMs) are effective in recalling information. The validity of the model may be impacted by extra previous information; hence LSTMs are an obvious option for usage. The LSTM block contains four gates: cell state gate retains g information over time, forget gate g f regulates the extent of the value maintained in the cell, input gate g i controls the extent of the value flow in the cell, and the output gate g o controls the extent of the value in the cell to be utilized for computing the output. A completely linked layer and an activation function are included in each gate. The LSTM block also has three inputs: cell state s t−1 ; previously hidden state h t−1 ; and current input x t , as well as three outputs: cell state s t ; hidden state h t ; and current output y t . The current output is created depending on the hidden state. The mathematical formulation of the LSTM units is as follows: s t = g f s t−1 g i g (13) A unified neural network of 1D-CNN, on the other hand, is made up of 12 layers: five 1D convolutional neural network (1D-CNN) layers, two dropout layers, one max-pooling layer, one flattened layer, and three fully connected layers. The signals are first routed via the first convolutional layer, which has a filter size of 32. The kernel of the first layer of our proposed 1D-Convolutional network is set to 3, indicating that all weights will be shared by every stride of the signal's input and output layers. The filter size is raised from 32 to 64 in the second 1D-convolutional layer, and the kernel size is adjusted to 3 in all 1D convolutional layers. Furthermore, the filter size is set to 128 in the third 1D-convolutional layer, but the padding is set to the same value in all three 1D-convolutional layers. The fourth layer is a max pooling layer that is used to down-sample the input representation from layer three. The pooling size is set to 3, the strides are set to 2, and the padding is set to the same. The fifth and seventh convolutional layer filter sizes are increased to 256 and 512, respectively, with the same padding and kernel size set to 3. The sixth and eighth layers are dropout layers with a value of 0.2 that are positioned between the fifth and sixth convolutional layers to reduce overfitting. The ninth layer is a flattening layer that reduces the input data to a single dimension. The flatten layer's single dimension was fed into the fully connected layer 10 with 512 nodes, and the leaky ReLU activation function was employed in all activation functions. Layer 11 has 256 nodes and is completely connected. After passing through Layer 11, data are delivered into the last fully linked layers with a linear activation function for final prediction. Figure 2 is the block diagram of the proposed model. The mathematical expressions for the 1D-convolutional neural network (y), Leaky ReLU activation (β), max pooling (mp e ), and dropout (d) layers are as follows: mp e = max(mp e1 : e ≤ e1 < e + s)

Repeated k-Fold Cross Validation
To improve the predicted performance of deep learning [87,88] models, repeated kfold is utilized in this study. As for regression and classification models, it may be used for both [89][90][91][92]. Provide the mean result across all folds from all runs by simply iterating the k fold cross-validation approach several times. The supplied dataset will be divided into k folds according to the first input k, which is specified to be 5 (or subsets). On the k-1 subsets, the model is trained, and its performance is assessed using the remaining subset. These stages will be repeated up to a limit of two times, which will be determined by the algorithm's second parameter. Each iteration of the repeated k-fold cross-validation is a conventional k-fold algorithm implementation. To begin, the dataset is divided into k subsets, each of which is randomly assigned a number between 1 and 5. The subset is used as a validation set, with the remaining subsets being used for training. On the validation or test set, the model is trained and assessed. The prediction error is also determined, and the step is performed k times. Finally, the total prediction error is calculated by averaging the prediction errors in each scenario.

Other Related Works
Chen et al. [93] proposed a new disease diagnosis and treatment recommendation system to make the best use of the sophisticated medical equipment found in modern hospitals and the depth of expertise of skilled physicians. First, a Density-Peaked Clustering Analysis (DPCA) technique for illness-symptom clustering is proposed in order to more precisely and effectively identify disease symptoms. Additionally, the Apriori algorithm does association studies on disease-diagnosis (D-D) rules and disease-treatment (D-

Repeated k-Fold Cross Validation
To improve the predicted performance of deep learning [87,88] models, repeated k-fold is utilized in this study. As for regression and classification models, it may be used for both [89][90][91][92]. Provide the mean result across all folds from all runs by simply iterating the k fold cross-validation approach several times. The supplied dataset will be divided into k folds according to the first input k, which is specified to be 5 (or subsets). On the k-1 subsets, the model is trained, and its performance is assessed using the remaining subset. These stages will be repeated up to a limit of two times, which will be determined by the algorithm's second parameter. Each iteration of the repeated k-fold cross-validation is a conventional k-fold algorithm implementation. To begin, the dataset is divided into k subsets, each of which is randomly assigned a number between 1 and 5. The subset is used as a validation set, with the remaining subsets being used for training. On the validation or test set, the model is trained and assessed. The prediction error is also determined, and the step is performed k times. Finally, the total prediction error is calculated by averaging the prediction errors in each scenario.

Other Related Works
Chen et al. [93] proposed a new disease diagnosis and treatment recommendation system to make the best use of the sophisticated medical equipment found in modern hospitals and the depth of expertise of skilled physicians. First, a Density-Peaked Clustering Analysis (DPCA) technique for illness-symptom clustering is proposed in order to more precisely and effectively identify disease symptoms. Additionally, the Apriori algorithm does association studies on disease-diagnosis (D-D) rules and disease-treatment (D-T) rules independently. Even if they are in a constrained therapeutic setting, patients and novice doctors are advised to receive the proper diagnosis and treatment plan. A parallel solution was also implemented using the Apache Spark cloud platform in order to achieve the objectives of high throughput and low response latency. Comprehensive experimental results show that the proposed system efficiently achieves illness-symptom clustering and provides intelligent and precise disease treatment suggestions. The proposed system's weakness is the lack of evaluation of the efficiency of disease diagnostic and treatment methods.
Wang et al. [94] developed a novel system for solving the problem of incremental group-level popularity prediction. The two key phases are restarting CP decomposition to reduce cumulative error and progressively forecasting by utilizing progressive CP decomposition. In terms of forecast accuracy and running time, extensive empirical studies show that IGPP performs better than other baselines. The study concentrated mostly on investigating dynamic diffusion throughout the temporal dimension. The authors also expanded the applications of our incremental methodology in big data environments and explored more general incremental approaches that can describe the evolving groups over time.
Chen et al. [95] applied a Periodicity-based Parallel Time Series Prediction (PPTSP) technique for large-scale time-series data that is suggested and implemented in the Apache Spark cloud computing environment. A Time Series Data Compression and Abstraction (TSDCA) approach is described to efficiently manage the enormous historical datasets. This algorithm can scale down the data while properly retrieving the features. On the basis of this, they suggested a Multi-layer Time Series Periodic Pattern Recognition (MTSPPR) algorithm employing the Fourier Spectrum Analysis (FSA) technique. A Periodicity-based Time Series Prediction (PTSP) algorithm is also suggested. The models for all prior periods are used to forecast data for the later period, and a temporal attenuation factor is added to reduce the influence of the various periods on the outcome of the prediction. Additionally, they developed a parallel approach on the Apache Spark platform, utilizing the Streaming real-time computing module, to enhance the performance of the suggested algorithms. Extended experimental results demonstrate that, in terms of prediction accuracy and performance, the PPTSP approach has a significant edge over competing algorithms.
Pu et al. [96] developed a new attention convolution neural network (named ED-ACNN) for anticipating the movement of people in every area of a city center, using historical human traffic data, and it is based on an encoder-decoder architecture. The proposed system is capable of learning all the spatial and temporal interrelations of vehicular images, including proximity, period, and pattern. The effectiveness of the method was assessed using three different real-world datasets from Beijing and New York City. It outperformed ten widely used baselines in terms of accuracy and efficiency, proving that the suggested approach is more suitable for predicting traffic flow. Experimentally, Beijing and New York City's two distinct forms of population flow were thoroughly evaluated, and the findings demonstrate that the suggested approach can be highly competitive with leading-edge thresholds.
Fillipe et al. [97] applied long short-term memory (LSTM) as a model for forecasting time series. The model focused on a large volume of data from a time series characterized by nonlinearities. However, Oyewola et al. [98] developed a novel Auditory Algorithm, which follows the pathway of the auditory system like that of the human ear. The results show that the auditory algorithm preforms better than other algorithms considered in the paper.

Results and Discussion
In this section, the dataset and developed models such as the Long Short-Term Memory (LSTM) and One Dimensional Convolutional Neural Network (1D-CNN) were subjected to repeated K fold cross-validation. Python 3.6, numpy, sklearn, keras, imblearn, pandas, matplotlib, seaborn, and plotly are the python packages used in this work. The hyperopt library in Python is used to perform Bayesian hyperparameter optimization. The supply chain dataset's missing values were shown using a heatmap. The heatmap in Figure 3 shows two colors: red for missing data and green for remaining values with no Nan values. Shipment Mode, Dosage, and Line-Item Insurance contain missing values. pandas, matplotlib, seaborn, and plotly are the python packages used in this work. The hyperopt library in Python is used to perform Bayesian hyperparameter optimization. The supply chain dataset's missing values were shown using a heatmap. The heatmap in Figure 3 shows two colors: red for missing data and green for remaining values with no Nan values. Shipment Mode, Dosage, and Line-Item Insurance contain missing values. Correlation heatmaps are heatmaps that display the degree of correlations between numerical variables. The links between variables and their strength are visualized using correlation graphs. Every numerical variable in a correlation plot is typically represented by a column. The rows reflect the connections between each pair of variables. The values in the cells represent the strength of the correlation; positive values represent a positive relationship, while negative values represent a negative association. The strength of potential relationships between variables may be evaluated using heatmaps of correlation. Additionally, linear, and nonlinear correlations, as well as outliers, may be found using correlation plots. It is easy to quickly spot relationships between variables because of the color coding of the cells. Correlation heatmaps may be used to discover both linear and nonlinear relationships between data. Line-Item Quantity is substantially connected with Line-Item Insurance, as seen in Figure 4, with a correlation value of 0.8. Meanwhile, the Pack Price has a 0.6 correlation coefficient with Unit Price, although Unit Price is unaffected by Unit of Measure or ID. Correlation heatmaps are heatmaps that display the degree of correlations between numerical variables. The links between variables and their strength are visualized using correlation graphs. Every numerical variable in a correlation plot is typically represented by a column. The rows reflect the connections between each pair of variables. The values in the cells represent the strength of the correlation; positive values represent a positive relationship, while negative values represent a negative association. The strength of potential relationships between variables may be evaluated using heatmaps of correlation. Additionally, linear, and nonlinear correlations, as well as outliers, may be found using correlation plots. It is easy to quickly spot relationships between variables because of the color coding of the cells. Correlation heatmaps may be used to discover both linear and nonlinear relationships between data. Line-Item Quantity is substantially connected with Line-Item Insurance, as seen in Figure 4, with a correlation value of 0.8. Meanwhile, the Pack Price has a 0.6 correlation coefficient with Unit Price, although Unit Price is unaffected by Unit of Measure or ID.
The data peaks are displayed using a violin plot, which is a mix between a box plot and a kernel density plot. It serves as a representation of the distribution of numerical data. Violin plots display summary statistics as well as the density of each variable, as opposed to box plots, which can only offer summary statistics. The median can be seen as a white dot in violin plots. Conversely, the narrow gray line represents the remaining portion of the distribution, while the wide gray bar in the middle displays the interquartile range. On either side of the gray line, a kernel density estimation is displayed to demonstrate how the data are distributed. The violin plot is divided between broader and skinnier parts, with wider areas representing a greater likelihood that members of the population would adopt the given value and skinnier areas representing a lower probability. As seen in Figure 5, the median Pack Price for Pediatric and Adult is about 1 but with a greater likelihood, whereas the median Pack Price for ACT is around 45 but with a lower chance. Furthermore, with a decreased chance, the median Pack Price for Malaria is about 30. i. 2022, 12,   The data peaks are displayed using a violin plot, which is a mix between a box plot and a kernel density plot. It serves as a representation of the distribution of numerical data. Violin plots display summary statistics as well as the density of each variable, as opposed to box plots, which can only offer summary statistics. The median can be seen as a white dot in violin plots. Conversely, the narrow gray line represents the remaining portion of the distribution, while the wide gray bar in the middle displays the interquartile range. On either side of the gray line, a kernel density estimation is displayed to demonstrate how the data are distributed. The violin plot is divided between broader and skinnier parts, with wider areas representing a greater likelihood that members of the population would adopt the given value and skinnier areas representing a lower probability. As seen in Figure 5, the median Pack Price for Pediatric and Adult is about 1 but with a greater likelihood, whereas the median Pack Price for ACT is around 45 but with a lower chance. Furthermore, with a decreased chance, the median Pack Price for Malaria is about 30. The performance of the deep learning models used in this study was enhanced with the aid of repeated k-fold cross-validation. This merely involves repeating the cross-validation process and reporting the average outcome across all folds from all runs. The sup- The performance of the deep learning models used in this study was enhanced with the aid of repeated k-fold cross-validation. This merely involves repeating the cross-validation process and reporting the average outcome across all folds from all runs. The supply chain shipping data are separated into five folds and repeated twice in this approach, as shown in Tables 1-6. When training our models, we must consider loss and accuracy. Loss is a metric that represents the sum of our model's errors. It determines how well or poorly our model is performing. Furthermore, accuracy assesses how well our model predicts by comparing model predictions to true values in percentage terms. Table 1 displays the Loss and Accuracy of Long Short-Term Memory (LSTM). The Loss is within the range of 1.3907 and 1.4682, which means that the model does not perform well. Also, the accuracy in Table 1 is within the range of 50 and 55 which shows that the accuracy is low. Since the loss is low and accuracy is low, it means the LSTM model is not performing well. Table 2 displays the Loss and Accuracy of the One-Dimensional Convolutional Neural Network (1D-CNN). The Loss is within the range of 1.3 and 1.4 while the accuracy is within the range of 52 and 55. Both the loss and accuracy were very low. It means that 1D-CNN is not performing well.  By keeping all the data from the minority class and lowering the size of the majority class, under-sampling is a method for balancing disparate datasets. As demonstrated in Figure 1, there were uneven classes in target variables such as shipment mode. The dataset was undersampled using AllkNN. The Loss and Accuracy of LSTM and 1D-CNN with AllKNN are shown in Tables 3 and 4. The accuracy of LSTM has increased from 50% to 63%, indicating that the model outperforms the prior model. To fine-tune the deep learning methods in Tables 3 and 4, Bayesian Optimization (BO) using a tree-structured parzen estimator was used. Bayesian Optimization is a method for finding the lowest or maximum of an objective function that uses the Bayes Theorem to guide the search. In many real-world analytics applications, optimizing a function is critical. By optimization, we imply determining the objective function maximum or minimum with a certain set of parameter combinations. Tables 5 and 6 show the outcomes of hyperparameter adjustment for LSTM+AllkNN and 1D-CNN+AllkNN. Tables 5 and 6 show a slight improvement in accuracy, indicating that the accuracy is performing well. Table 7 shows all of the parameters used in LSTM+AllkNN and 1D-CNN+AllkNN before and after Bayesian Optimization using the Tree Parzen estimator in LSTM+AllkNN and 1D-CNN+AllkNN, respectively. After using Bayesian Optimization, there is a modest rise in the parameter. The pie chart of the overall Loss of LSTM, 1D-CNN, LSTM+AllkNN, 1D-CNN+AllkNN, LSTM+AllkNN+BO, 1D-CNN+AllkNN+BO is shown in Figure 5. 1D-CNN+AllkNN has a relatively low percentage loss compared to other models, indicating that the model outperforms the others. Furthermore, Figure 6 depicts the pie chart of the overall loss of LSTM, 1D-CNN, LSTM+AllkNN, 1D-CNN+AllkNN, LSTM+AllkNN+BO, and 1D-CNN+AllkNN+BO. The results show that 1D-CNN+AllkNN+BO has the lowest loss with 15.3% while LSTM and 1D-CNN have the highest loss with 18.3%. Figure 7 depicts the overall accuracy of LSTM, 1D-CNN, LSTM+AllkNN, 1D-CNN+AllkNN, LSTM+AllkNN+BO, and 1D-CNN+AllkNN+BO. The ranking is topped by 1D-CNN+AllkNN, which has an extremely high accuracy of 17.6 percent. This means that the model performs admirably in supply chain management.

Conclusions
This research was conducted with the goal of enhancing supply chain management, and saving time and money by doing away with manual intervention. Based on our findings, we demonstrated that Bayesian Optimization with the tree parzen estimator and AllkNN may be used to optimize deep learning models such as Long Short-Term Memory (LSTM) and One Dimensional Convolutional Neural Network (1D-CNN). The experimental results showed that combining 1D-CNN, AllkNN, and Bayesian Optimization with a tree parzen estimator may increase the accuracy of the supply shipment price datatset. This research, like any other study, has some limitations. There is a chance that bias crept in because all of our samples were gathered via the Kaggle website. Furthermore, despite our best efforts, it is possible that the sample we gathered from the industry was not large enough to reflect the whole sector. Due to these limitations, further research must be done with larger samples across other industries in order to gain fresher insights.
In the future, we hope to broaden our empirical studies to include other and bigger configuration spaces, as well as increase the number of iterations and datasets. We may also mitigate the problem of overfitting in deep learning by reshuffling the train and validation split for each function evaluation. Machine learning techniques are an area of interest that we have not addressed. For supply chain management classification, machine learning techniques can be used with Bayesian Optimization approaches. The goal of Bayesian model-based optimization is to minimize the number of times the objective function must be run by evaluating just the set of hyperparameters that has shown the most promise in prior calls to the evaluation function. The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization in which models are constructed progressively to estimate the performance of hyperparameters based on previous measurements, and then new hyperparameters are chosen to test based on this model.

Conclusions
This research was conducted with the goal of enhancing supply chain management, and saving time and money by doing away with manual intervention. Based on our findings, we demonstrated that Bayesian Optimization with the tree parzen estimator and AllkNN may be used to optimize deep learning models such as Long Short-Term Memory (LSTM) and One Dimensional Convolutional Neural Network (1D-CNN). The experimental results showed that combining 1D-CNN, AllkNN, and Bayesian Optimization with a tree parzen estimator may increase the accuracy of the supply shipment price datatset. This research, like any other study, has some limitations. There is a chance that bias crept in because all of our samples were gathered via the Kaggle website. Furthermore, despite our best efforts, it is possible that the sample we gathered from the industry was not large enough to reflect the whole sector. Due to these limitations, further research must be done with larger samples across other industries in order to gain fresher insights.
In the future, we hope to broaden our empirical studies to include other and bigger configuration spaces, as well as increase the number of iterations and datasets. We may also mitigate the problem of overfitting in deep learning by reshuffling the train and validation split for each function evaluation. Machine learning techniques are an area of interest that we have not addressed. For supply chain management classification, machine learning techniques can be used with Bayesian Optimization approaches. The goal of Bayesian model-based optimization is to minimize the number of times the objective function must be run by evaluating just the set of hyperparameters that has shown the most promise in prior calls to the evaluation function. The Tree-structured Parzen Estimator (TPE) is a sequential model-based optimization in which models are constructed progressively to estimate the performance of hyperparameters based on previous measurements, and then new hyperparameters are chosen to test based on this model. Funding: This research received no external funding, and the APC was funded by Virginia Tech University research support to O.E.

Data Availability Statement:
The dataset used was obtained from Kaggle (a freely available online openaccess database); https://www.kaggle.com/code/divyeshardeshana/supply-chain-shipment-price-dataanalysis/data (accessed on 10 August 2022). There is no need to obtain any consent whatsoever.