Next Article in Journal
Metal–Organic Frameworks as Powerful Heterogeneous Catalysts in Advanced Oxidation Processes for Wastewater Treatment
Next Article in Special Issue
Development of a Novel Object Detection System Based on Synthetic Data Generated from Unreal Game Engine
Previous Article in Journal
Multifunctional MEN-Doped Adhesives: Strengthening, Bond Quality Evaluation, and Variations in Magnetic Signal with Environmental Exposure
Previous Article in Special Issue
A Graph-Based k-Nearest Neighbor (KNN) Approach for Predicting Phases in High-Entropy Alloys
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

A Review on AI for Smart Manufacturing: Deep Learning Challenges and Solutions

Applied Network Technology Lab, Huawei Technologies Duesseldorf GmbH, Riesstrasse 25, 80992 Munich, Germany
Fraunhofer Institute for Open Communication Systems, Kaiserin-Augusta-Allee 31, 10589 Berlin, Germany
Laboratory of Process Automation System, TU Dortmund University, Emil-Figge-Strasse 70, 44227 Dortmund, Germany
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(16), 8239;
Submission received: 18 July 2022 / Revised: 8 August 2022 / Accepted: 12 August 2022 / Published: 17 August 2022
(This article belongs to the Special Issue Applied Artificial Intelligence (AI))


Artificial intelligence (AI) has been successfully applied in industry for decades, ranging from the emergence of expert systems in the 1960s to the wide popularity of deep learning today. In particular, inexpensive computing and storage infrastructures have moved data-driven AI methods into the spotlight to aid the increasingly complex manufacturing processes. Despite the recent proverbial hype, however, there still exist non-negligible challenges when applying AI to smart manufacturing applications. As far as we know, there exists no work in the literature that summarizes and reviews the related works for these challenges. This paper provides an executive summary on AI techniques for non-experts with a focus on deep learning and then discusses the open issues around data quality, data secrecy, and AI safety that are significant for fully automated industrial AI systems. For each challenge, we present the state-of-the-art techniques that provide promising building blocks for holistic industrial AI solutions and the respective industrial use cases from several domains in order to better provide a concrete view of these techniques. All the examples we reviewed were published in the recent ten years. We hope this paper can provide the readers with a reference for further studying the related problems.

1. Introduction

Artificial intelligence (AI) is one of the most discussed and pursued topics of today’s research. In particular, deep learning (DL)—a subfield of AI, see Figure 1—is showing impressive performance that is, in certain cases, superior to human beings. For instance, AlphaGo defeated Go game masters, and Google Vision posed higher accuracy than humans in image recognition [1,2]. Consequently, the potential of DL has been highly recognized by both academia and industry and is now being applied to a number of domains such as personal entertainment, healthcare, and smart manufacturing.
Industry 4.0, Advanced Manufacturing Partnership, and Made in China 2025 are political digitalization programs that intend to push modern industrial processes towards fully automated and self-adaptive levels. For this, the physical world of materials, machines, and products is connected to the virtual world of computers through networked sensors and actuators within the production processes using the concepts of the Internet of Things (IoT) and cyber-physical systems (CPS). Consequently, an unprecedented volume of data is continuously generated and calls for effective processing that enables fast response time, intelligent decision-making, and global optimization. Hence, data-driven AI techniques, such as deep learning, have great potential and have been applied to many smart manufacturing applications, as well as other industrial domains, such as mining and transportation. In the report provided by Deloitte [3], it was estimated that the market size of AI adoption in manufacturing would grow to 2057 million dollars in China in the year 2025. They have also found that 83% of their surveyed companies think AI will make or has made an impact. Similarly, according to Microsoft’s report [4], 81% of the 86 companies they surveyed have seen AI become more important in 2020. The main application categories of AI in smart manufacturing include predictive maintenance, quality assurance, yield enhancement, collaborative robots, as well as supply chain and warehouse management.
Early predictive maintenance can shorten the downtime and reduce economical costs. AI enables predictive maintenance to avoid future faults and helps to automate fault diagnosis for manufacturing equipment [5,6,7]. Quality assurance aims at automated product quality checks. In particular, image-based techniques (visual inspection) are widely applied [8,9,10]. At present, yield enhancement AI applications are concentrated on the integrated circuit manufacturing industry due to its high digitalization level, complicated manufacturing technologies, and expensive yield loss costs. The main ideas for yield enhancement are to identify yield losses and remove the causes behind it [11,12,13]. Industrial robots are key components for improving efficiency in manufacturing and are supposed to be adaptive to environmental changes instead of having hard-coded exact poses for each movement. This can be achieved through vision-guided motion control [14]. Moreover, robots are expected to work collaboratively with humans and other robots in a shared environment while maintaining human safety [15]. Applications for supply chain management can be categorized into the domains of planning, supplier selection, logistics, and warehouse management. AI can forecast inventory and demand to improve planning with proactive and agile decision-making [16]. Similarly, AI can help to select suppliers by predicting their reliability [17]. Logistics places emphasis on route optimization for fast and economical transportation [18]. AI for warehouse management is more general and aims to enable fully automated warehouse operation, including inventory forecasting, warehouse robots, and logistic optimization [19].
However, developers still encounter fundamental obstacles when building deep learning solutions for smart manufacturing applications. Deep learning is a data-hungry technique, which requires sufficient pre-collected data. It is, however, hard to be fulfilled in practice due to the lack of data sampling and storing infrastructures, short development time expected by the users, changing working conditions and the cost of data pre-processing and labeling. Therefore, the real-world data can be of low volume, unlabeled (for supervised learning) and spatially and temporally evolving. Furthermore, industrial data are normally secrecy-sensitive, i.e., the enterprises are not willing to share the data with outside entities due to business interests.
In this view, we categorize these obstacles into three main areas: data quality, data secrecy, and AI safety. This paper aims to demystify deep learning for non-experts through an informative introduction to its underlying concept—neural networks—so that the challenges can be better understood by systems and industrial research communities. Furthermore, we present the state-of-the-art advanced techniques that form the basis for tackling these challenges. While there exist multiple surveys on AI for smart manufacturing, those are either focusing on individual applications or technical domains, such as predictive maintenance in [20] and machine-learning-based software development in [21], or introduce basic deep learning techniques only and their industrial applications [22,23,24,25]. We list the related survey articles on deep learning in smart manufacturing in Table 1. As far as we know, there exists no work that identifies the three main practical challenges for deep learning in smart manufacturing and extensively discusses their potential solutions based on the latest research developments.
This paper is organized as follows: Section 2 gives an overview over the basics of deep learning; Section 3 reviews the challenges of data quality; Section 4 summarizes the challenges in data secrecy; Section 5 is a survey on the trustworthiness of deep learning; Section 6 is the final summary and conclusion.

2. Deep Learning Overview

Deep learning is a class of machine learning (ML) algorithms based on deep neural networks (DNNs). The ML computing can be divided into training and inference phases. Training is undertaken to configure the machine learning models using training data (see Section 2.2), whereas inference is to let the models infer things from the new data. According to how the model is trained, ML algorithms can be categorized into three classes: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning aims at mapping inputs to desired outputs (labels), such as image-based object recognition. It is “supervised” because the training data contains manually annotated labels, e.g., object classes. Supervised learning is usually utilized for data classification and regression. Conversely, unsupervised learning is label-free by exploiting the internal characteristics of the data, i.e., it does not require manual annotation. It is often applied for data clustering and dimension reduction, which is to map high-dimensional data to lower-dimensional representations. Reinforcement learning is different from the above in that it aims to enable the machines to learn to take appropriate actions. It is realized based on the philosophy of letting the machine obtain positive feedback (rewards) from the environment when it takes the right actions according to the environmental status and vice versa. Reinforcement learning is usually utilized for tasks related to planning and control.
In addition to deep learning, other ML algorithms exist—often referred to as “traditional”—such as linear regression and support vector machines for supervised learning, K-nearest neighbor and principal component analysis for unsupervised learning, as well as Q-learning for reinforcement learning. For more details on these algorithms, see [37].

2.1. Neural Networks

Deep learning fully relies on DNNs, which are derived from standard neural networks (NNs). Neural networks consist of connected artificial neurons, which are mathematical functions, as illustrated in Figure 2a. The neurons are normally layer-wise grouped and connected, where the neuron outputs (signals) of one layer are fed to the next layer. For instance, the inputs x 1 , x 2 and x 3 form a layer in Figure 2b. The weights and biases are configurable parameters in NNs during training. In other words, a neural network is a nested non-linear transformation of the inputs. In neural networks, all the layers between the first- and last layers are called hidden layers. Neural networks with more than two hidden layers are called deep neural networks. Otherwise, they are shallow neural networks.

2.2. Model Training

During neural network training, the key objective is to minimize the so-called loss function through an appropriate choice of the model parameters. Loss functions are usually defined according to the desired goals to achieve through the neural networks, such as the discrepancies between the predictions and labels for supervised classification, and to indicate how well the neural networks perform the tasks. Additionally, the loss functions can include the numerical regularization on the neural network parameters, such as weight sparsity. However, loss functions are usually non-convex and may hold many local minima, as depicted in Figure 3. Moreover, DNNs usually hold thousands to millions of parameters. To search for a globally minimized loss analytically is, therefore, virtually unsolvable. Hence, DNNs are usually trained in an iterative fashion based on gradient descent methods. During each iteration, the parameters are updated slightly towards the direction that the loss drops fastest, which is represented by the partial derivative with respect to each parameter, as given in (1). Vector p holds the DNN parameters, and p stands for the updated parameters after each iteration. α is the learning rate that adjusts the step size for each update, and L ( p ) is the loss function, which depends on the parameters.
p = p α L ( p )
To compute the gradients, the data samples are firstly forward-passed to obtain the outputs, i.e., similar to inference, and then the loss values and gradients are computed. Due to the layer-wise structure of DNNs and the large number of parameters to optimize, the gradients are normally efficiently computed using the chain rule and back-propagation strategies. The gradients in the last layer can be directly computed since they are spatially closest to the outputs and loss functions, as illustrated in Figure 4. The gradients in the earlier layers are then derived in sequence layer-by-layer based on the gradients computed in later layers.
The entire dataset is normally used for training a neural network. However, how to assign the data samples for each parameter update is a problem. One sample per update is burdensome and fluctuates the training processes. On the other hand, using the entire dataset at once for each update, i.e., batch training, can be highly computationally intensive. Hence, the training data are usually divided into mini-batches, i.e., a smaller portion of the dataset that is used to compute the gradients and update the parameters within one training iteration. The gradients computed using the mini-batch data are the mean of the gradients for each data sample. Such a mini-batch strategy is intended to approximate the gradients computed using the entire dataset and is beneficial for training convergence. The process of visiting the entire dataset after a number of parameter updates is called an epoch. As introduced, the training objective is to search for the loss minimum iteratively. Tens to millions of epochs are, therefore, necessary to train a neural network.
Managing the training complexity of the DNNs is still an open question. Although it is not hard to calculate the computational complexity for each epoch in terms of the number of operations, the training time and model accuracy depend on a number of factors, such as datasets, hardware platform, optimization method, and network architecture. In particular, the quality of training data is crucial for sound DNN models.

2.3. Deep Neural Network Architectures

According to the universal approximation theorem, a shallow neural network consisting of infinite neurons can approximate any continuous functions [39]. Therefore, shallow NNs are theoretically powerful enough to model any datasets. However, conventionally, their inputs are manually annotated features instead of correlated and redundant raw data for successful and efficient learning [40]. Features are pieces of attributes or encodings of raw data that are more outstanding for solving the learning tasks, e.g., the object edges in images are better features for human detection compared with raw pixel values.
However, the learning tasks and data are becoming more and more complex. Manual feature annotation is hence less and less effective. Moreover, shallow networks are poor at generalization with raw complex data and slow to be trained [41]. Thankfully, DNNs can automatically learn hierarchical features that are generic and reusable among various data samples. Hence, DL is known for feature learning or representation learning in contrast to manual feature engineering. Thanks to the rapid improvements to the computational power in recent decades, the deeper and more complicated neural networks can be widely used. DNN architectures refer to the number of neurons, their connection topology as well as their intended learning tasks. Numerous DNN architectures have been invented in recent decades for more efficient and effective feature learning.
The success of deep learning started from supervised learning, especially image processing tasks, with the invention of convolutional neural networks (CNNs) [42]. CNNs have convolutional layers (see Figure 5a), which are derived from the convolution operators in traditional signal processing. Convolution kernels are good at exploiting the relations between neighboring input signals and learning spatial-invariant features that might be distinguished data representations (salient features). They are also more computationally efficient than fully-connected layers and specifically used for image processing tasks, such as visual inspection [10].
In order to learn the features describing the relations between successive input signals, recurrent neural networks (RNN) were proposed, which are characterized by the use of RNN cells [43]. Unlike feedforward neural networks, in which the signals are propagated sequentially without any loops, i.e., revisiting previous data samples, RNN cells compute outputs based on current and previous inputs in order to learn temporal-invariant features, as shown in Figure 5b. At each time step, the new data samples combined with the outputs from the last time step are fed into the network. RNNs are usually used for time series analysis, such as in anomaly detection for quality assurance with continuous sensory measurements [44].
CNNs and RNNs are layer-level architectures that can be combined with other layer-level operators, e.g., fully-connected layers. For example, multiple layers of RNNs or CNNs can be stacked to learn multi-level spatial- or temporary-invariant features, respectively [7]. DNN architectures are defined in such layer-wise stacking manner for different tasks and data modalities.
In order to exploit the internal connections of the complex datasets, DNNs are also exploited for unsupervised learning due to their distinguished feature learning abilities. Auto-encoders (AEs) are a type of representative techniques for dimension reduction. As shown in Figure 6, an AE consists of two different DNNs, the encoder and the decoder. The encoder maps the inputs to a space of lower dimensions, the so-called latent space. The latent values describe the original data source more efficiently than the actual raw data produced and, hence, can be used as a general representation for all data samples. During training, the loss function is defined to encourage the decoder to recover latent space data back to the original inputs as well as possible. Therefore, encoders are capable of capturing the key information that is sufficient for data reconstruction.
However, AEs extract the features in an unstructured way since the latent values are automatically learned by the network and sparsely connected. Kingma et al. hence propose variational autoencoders (VAEs), assuming that the data samples follow underlying probabilistic distributions, i.e., Gaussian [45]. VAEs are trained to find the parameters of these Gaussian distributions that can be utilized to exploit the statistical relations between the datasets. Meanwhile, each sample from latent distributions can be reconstructed into meaningful plain data (see [45] for details). Samples from the latent distributions can be reconstructed into novel inputs, which are known as generative models.
In addition to VAEs, generative adversarial networks (GAN) are another class of generative models. As shown in Figure 7, a GAN consists of two parts, a generator DNN and a discriminator DNN. The two parts are trained in an adversarial way: the generator is encouraged to produce fake samples that resemble real samples as much as possible, whereas the discriminator is encouraged to detect fake ones. The training stops when both are optimized. The generator will finally be capable of imitating the statistical distribution of the real data and create new realistic samples. Both encoder/decoder and generator/discriminator can be CNNs, RNNs, or hybrid.
Apart from the above-mentioned network architectures, the Attention mechanism is one of the greatest breakthroughs in the machine learning society, which was firstly proposed for machine translation [46]. Attention allows the neural models to automatically pay more attention to certain parts of inputs that are more relevant to the task outputs. The attention mechanism has been widely applied in industrial applications to improve performance. Li et al. utilize GRU (an RNN variant) with an attention mechanism for sales forecasting in [47]. Wang et al. designed CNN with attention to machinery prognostics in [48].
We summarize the above-introduced network architectures in Table 2. There are tens of other network architectures proposed, such as deep belief networks or neural Turing machines. A more comprehensive introduction to DNN architectures is available in The Neural Network Zoo [49]. Moreover, a successful industrial deep learning application is far more than the right model selection. After our extensive study on the current research and industrial applications of AI in smart manufacturing, we have identified three dominating challenges: data quality, data secrecy and AI safety. These challenges, along with potential solutions will be discussed in the remainder of the paper.

3. Challenge: Data Quality

For deep learning, it is widely believed that “the larger the datasets, the better the performance”. To train a DNN from scratch usually requires huge amounts of data samples, even in the order of millions, as the number of parameters representing the state-of-the-art models are usually in the millions. However, having sufficiently qualified datasets is a luxury for deep learning developers. Firstly, reliable data collection and storage is labor- and cost-intensive, as it usually requires the installation of new infrastructures. Secondly, data preparation, such as removing mistakes and errors, data format transformation and labeling, requires additional efforts. Thus, research is exploring ways to develop accurate deep learning models with non-ideal datasets.

3.1. Data Augmentation

Data augmentation aims at artificially expanding the data volume and possibly enhancing the data diversity for better model generalization, i.e., avoid overfitting. Generally, there are three branches of data augmentation methods, namely manual synthetic data, traditional signal processing methods, and machine-learning-based methods. A greatly more comprehensive survey on data augmentation is given in [61].

3.1.1. Manual Methods

Data can be augmented with artificial samples through simulation or laboratory experiments, which are usually domain-dependent. For instance, 3D-object/CAD models are often used to mimic the imaging processes to obtain 2D images, e.g., mechanical element images for object detection [62]. Lessmeier et al. sample the sensory measurements of motor vibration for bearing fault diagnosis through accelerated lifetime tests in [63]. They simulate four different operating conditions and record 20 stream measurements of four seconds for each setting.

3.1.2. Signal-Processing-Based Methods

Traditional signal processing methods for data augmentation are usually heuristic. For images, e.g., used in visual inspection [64], commonly used methods are rotating, flipping, cropping, color channel manipulation, and noise injection. For time series, e.g., used in anomaly detection [65], data augmentation can be conducted in the time domain, frequency domain, and a hybrid with local averaging, perturbation, shuffling, etc.

3.1.3. Machine-Learning-Based Methods

It is sometimes hard to generate samples that sufficiently resemble reality with the above-mentioned methods as certain data generation processes cannot be modeled analytically. Generative models are, however, good at modeling the data generation processes in a data-driven manner. Real data samples are usually used to train the generative models that can generate new samples afterward. Gao et al. study GAN-based data augmentation methods for fault diagnosis in [66]. They developed experiments to simulate the stream data with multiple datasets and show that the classification accuracy can be improved by up to 5% when using the mixed data.
It has to be noted that synthetic datasets created through data augmentation cannot fully replace real data. Efforts have to be made to close the gap between the real and augmented samples in the model training accuracy. First of all, the augmented data should be in line with reality, meaning that no samples shall be generated that could not appear from the actual source. Secondly, the models trained with such synthetic datasets should be tested using real data. Domain adaption techniques (DA) are usually applied here to secure the model performance on real data [67]. DA is a class of transfer learning (see Section 3.4) scenarios that solves the issues when training and testing data are not under the same statistical distributions.

3.2. Semi-Supervised Learning

As the name implies, semi-supervised learning (SSL) is the conjunction of supervised and unsupervised learning methods, which is usually applied when the labeled data are not sufficient [68]. In SSL, the probability distributions of labeled and unlabeled samples are assumed to be equal. The same holds for their relationships with the labels.

3.2.1. Self-Teaching

The most intuitive SSL method is self-teaching. The model is initially trained using labeled data and then used to label the unlabeled samples through prediction instead of conventional manual labeling. Such labels are often called pseudo labels. In the following training epochs, the models are trained using labeled and pseudo-labeled data altogether as in normal supervised learning. The pseudo labels have to be updated with the newest model after each epoch until the loss function converges [69].
Rajani et al. train a CNN with self-teaching strategies for processing ultrasound signals to detect fouling in industrial equipment in [70]. Their results show that training with semi-supervised learning can reduce 15% of the error rate compared with using labeled data only. Self-teaching methods are easy to implement, but the errors can propagate across the epochs and self-reinforce if the model accuracy is initially low. Moreover, it is not scalable when the unlabeled data are greatly dominant.

3.2.2. Generative Models

Under the assumption that probability distributions and labeling rules are the same for labeled- and unlabeled data in SSL, samples that are numerically close to each other have a high probability of holding the same labels. The unlabeled data can be classified according to their neighboring labeled samples. In order to better compare the numerical distances, people tried to project the samples to feature spaces (normally latent space) using unsupervised learning methods, e.g., generative models (see Section 2.3). Kingma et al. propose their M1 and M2 models based on this rationale in [71]. The pipeline is illustrated in Figure 8.
Compared with self-teaching, these methods are normally of higher accuracy if unlabeled samples increase. Yoon et al. predict the remaining useful life for turbofan engines based on sensory time series measurements and a VAE-based M1 model in [8]. The results tested on multiple prediction models show that the accuracy can be improved by 5% with the VAE-SSL. Zhang et al. apply the M2 model for bearing anomaly detection with vibration measurements in [72]. After testing on different datasets, the classification accuracy improves between 3% and 30% with the variation of the labeled data portion.

3.2.3. Graph-Based Methods

Datasets can be represented as graphs, where the nodes are the data samples, and the edges reflect the numerical distances between samples and even more information describing their relations, such as communications between the nodes. Graph-based SSL methods are normally based on the assumption that the close samples in the graphs should hold the same labels.
Yang et al. propose to learn the graph embeddings and then exploit them as inputs to the DNN classifiers [73]. The embeddings are the representations or encodings of the graph properties, e.g., node features that include the node attributes and the relationships with the neighbors. The graph embeddings are learned through enforcing the DNNs to predict the node context, i.e., neighboring node features, to enable the DNNs to capture the relations between the neighboring nodes. Chen et al. apply such graph embedding learning SSL methods to steel plant energy prediction in [74]. The performance can increase up to 0.15 with the model correlation coefficient (it is MCC, which is used to represent the similarity between two variables) metric.
Kipf et al. apply graph neural networks (GNNs) for semi-supervised node classifications, in which the GNN inputs are the sub-graphs (part of nodes and their edges) and the outputs are the predicted labels of the nodes, as shown in Figure 9. GNNs are neural networks that are specifically applied to graph processing and good at representing the node features, edges, and sub-graphs. The model is trained in a supervised fashion through minimizing the wrong predictions for the labeled nodes.
SSL is essential in industrial deep learning applications since the labels are often costly to obtain. Graph-based SSL methods have wider application domains since the datasets can be not only individual instances but also natural explicit graphs, e.g., communication networks, in which there are information exchanges between the nodes.

3.3. Active Learning

In addition to semi-supervised learning, active learning (AL) is another category of algorithms used to deal with the scenarios when the annotated data are not sufficient. In active learning, the ML models are initially trained using the labeled data, and then adapted to select the unlabeled samples to query humans (oracle) for labeling. In the next iteration, the newly labeled data will be utilized for model retraining. Such iterations repeat until the model performance is satisfactory (convergence) or the labeling budget is depleted.
The primary challenge of active learning is the query strategy that selects the samples to label, and the goal is to minimize the volume of data to be labeled while maintaining the trained model accuracy. The goal of the query strategies is to find the samples that the models are easily wrong about. However, the neural networks can normally not judge the correctness of their predictions. There are mainly two categories of query strategies for deep neural networks, i.e., uncertainty sampling and diversity sampling.

3.3.1. Uncertainty Sampling

The main idea of uncertainty sampling is to search for samples that the models are not certain about. In the context of deep learning, due to the fact that the deep neural networks can not normally tell the uncertainties of their predictions, most uncertainty sampling query strategies rely on Bayesian neural networks (BNNs) [75]. BNNs are different from conventional neural networks in that their parameters are statistical distributions instead of deterministic values. During inference, these parameter distributions are sampled to compute the predictions, and the predictions on the same inputs can be different after each sampling.
On the other hand, when the predictions given by variant sampled models disagree with each other (uncertainty score), it can be assumed that most of the predictions are wrong (BALD) [76]. These samples with high uncertainty scores are supposed to be selected for annotation. Mathematically, BALD selects the samples that maximize the mutual information between the predictions and the model parameters. However, for deep learning, frequent retraining with single data samples is inefficient and can result in overfitting. Kirsch et al. propose BatchBALD, which extends a single sample selection in BALD to a batch of samples by jointly exploiting their mutual information with the model parameters [77]. Martinez-Arellano et al. implement a BNN-based active learning method for tool condition monitoring that can handle the appearance of the new sensory measurement patterns caused by degradation [78]. The uncertainty information determines which data to label and utilize for retraining in order to improve performance.

3.3.2. Diversity Sampling

Diversity sampling aims to select a subset of the entire unlabeled data for annotation while maintaining the model accuracy trained with the subset as much as possible compared to the training with the whole dataset. The labeling data volume can thereby be greatly reduced. Geifman et al. propose the “long-tail” method that selects the furthest samples to the given labels ones through comparing the samples numerically [79]. Senser et al. publish their core-set approach in which the query strategy is turned into a core-set searching problem [80]. In their work, it is proved that searching for a core-set of a dataset while maintaining the DNN training accuracy is equivalent to a k-Center problem. In a k-Center problem, k centers are chosen from a set of samples so that the maximum distance of each sample to its closest center is minimized.
Diversity sampling is usually jointly utilized with the uncertainty sampling as well since it is found that the samples selected through uncertainty sampling can often show insufficient diversities and informativeness [77]. Chen et al. implement a hybrid active learning query strategy consisting of both uncertainty and diversity sampling for gearbox fault diagnosis in [81]. They show that their hybrid method can surpass uncertainty and diversity sampling methods and achieve the highest classification accuracy of 90%.
Xu et al. adopt active learning for software defect detection in [82] by combining uncertainty and diversity measures as well. Similarly, they show that hybrid methods can surpass uncertainty and diversity sampling methods.

3.4. Transfer Learning

The philosophy of transfer learning (TL) is to mitigate data scarcity issues using similar datasets through learning a model in a target domain D t with the help of a source domain D s . A domain consists of the ML task, corresponding datasets, and the model. The data in D s and D t are not required to be under the same statistical distribution. TL is helpful in saving the labeled data (similar to semi-supervised learning) and accelerating model convergence. However, it has to be noticed that TL can result in a negative transfer, where the target model accuracy is degraded after transferring, especially when D s and D t disagree a lot. According to the contents transferred, transfer learning can be categorized into instance-, feature-, parameter-, and relational-based [83]. Among these, feature- and parameter-based TL are most adapted by deep learning models.

3.4.1. Instance-Based Transfer Learning

In instance-based transfer learning, samples in D s are utilized directly with D t samples to train the models. The D s samples should be firstly selected according to their relevance with target domain data before transferring in order to reduce the possibilities of negative transfer [84]. The similarities are normally the test accuracy of D t models on D s samples or statistical similarities between D s and D t data distributions.
Wang et al. propose to select the D s samples using DNNs in a GAN-fashion, in which the “generator” (data selector) selects the D s samples to fit the D t data distribution and fool the “discriminator” while the “discriminator” distinguishes them from the D t data as well as possible to boost the data selector performance [85]. Zhang et al. built a ball screw degradation recognition application using instance-based TL, in which they selected samples in D s according to the reconstruction errors with the AEs trained in D t [86]. With the experimental results on multiple datasets, they find that models trained with transfer learning are superior to the ones trained on small datasets only.

3.4.2. Feature-Based Transfer Learning

Feature-based transfer learning establishes the collaboration between D s and D t by exploiting their common features, i.e., domain-independent features. However, DNNs learn the features automatically from the data, which makes them hard to be explicitly interpreted and compared across the domains. Technically, extra loss functions are imposed on the neural networks during training, so the models can learn transferable features across the domains [87].
The target model is trained using D s and D t data simultaneously, and the extra loss functions are often in the form of minimizing the numerical distances between the layer-wise outputs in the intermediate layers computed on D s and D t data. The model can, therefore, learn highly similar features from D s and D t data, thereby addressing the data scarcity issue of a target domain. Zhu et al. apply this rationale to industrial fault diagnosis on bearings in [88]. The results of transfer learning are better than the baselines using vanilla training.

3.4.3. Parameter-Based Transfer Learning

Parameter-based is the most common approach for deep transfer learning [89]. The common practice is to first train the model purely in D s and then transfer their parameters to the target model(s) for initialization, followed by fine-tuning the models with samples from D t . A toy example is illustrated in Figure 10. The source and target networks do not need to be identical; layers that share the parameters should merely be of the same size. Parameter-based TL is widely adapted for CNN-based computer vision tasks. It is proven that the early layers of the pretrained CNN models are capable of capturing low-level features of images, which are relatively independent of the datasets [90]. CNN layers from AlexNet, VGG, or ResNet trained on large-scale image recognition datasets, such as CIFAR and ImageNet, are popular sources for pretraining [91]. Mittel et al. develop a CNN-based metal crack recognition system with pretrained parameters borrowed from the publicly released trained GoogleNet and achieved an accuracy of 99.8% [9].
Transfer learning has drawn more and more attention in recent years since data scarcity is one of the most commonly seen obstacles for deep learning applications. Compared with other techniques introduced in Section 3, TL is the most widely applied. However, the methods that choose the contents to transfer and thus prevent a negative transfer is case-dependent.

3.5. Continual Learning

It is common in deep learning applications that new data patterns emerge over time. For instance, machinery fault classes may vary due to the changes in the working environment and/or the aging of the machine itself. Moreover, fault patterns usually do not appear within short periods but are stretched out over the entire lifetime of a machine. Therefore, it would take years to collect extensive datasets that cover all possible classes. This, however, is undesired by the users, who want to benefit from AI as early as possible.
Continual learning (CL), also known as lifelong learning or incremental learning, aims at updating models through retraining for and with the new incoming data, without forgetting the knowledge learned before [92]. It can be well understood that retraining the DNNs completely from scratch is time-consuming and requires a large memory space to store all the training data. Common methods to mitigate forgetting are regularization approaches, memory replay, and dynamic network architectures. Generally, continual learning can be considered as a special form of transfer learning, in which the source domain refers to the original data and the target domain to the one also including the new samples seen in the field. Therefore, continual learning is also called forward transfer.

3.5.1. Regularization-Based Continual Learning

Regularization-based continual learning methods constrain the parameter updates to keep them close to the configurations before retraining. Representative algorithms in this category are elastic weight consolidation (EWC) [93] and learning without forgetting (LwF) [94]. Li et al. apply LwF to object detection tasks in order to enable the network to remember older, already learned object classes [95]. However, regularization-based methods are hard to scale, and it was demonstrated that new classes cannot be learned incrementally [96].

3.5.2. Memory Reply

The idea behind memory reply is intuitive: The model is updated with a mix of new data and a portion of historic data that is saved before retraining to prevent catastrophic forgetting [97]. From the taxonomy of transfer learning, memory replay on historical data can be understood as instance-based transfer learning. With the emergence of GANs and VAEs, efficient storage of historic data can be realized by training such deep generative models [98]. Wiewel et al. use the decoder of a VAE for memory replay to enable continual learning for anomaly detection tasks [99].

3.5.3. Dynamic Architectures

Dynamic architecture methods increase and re-assign the neural network parameters among the old and new training data. In PackNet [100], DNN parameters with small values are free to update when retraining for new tasks, whereas the large parameters are kept fixed. Progressive neural networks create new neurons automatically for new tasks (neural expansion), while old neurons are frozen during retraining to prevent losing memory [101]. In classification tasks, new neurons are introduced when new classes appear. The early DNN layers and the output neurons for old classes are usually frozen to retain the learned knowledge [102]. Zhang et al. built an incremental quality prediction system for fluid catalytic cracking based on neural expansion [13]. The results show that the model can be adaptive to the concept drift of the model.
Deep continual learning is a key step for model maintenance, which is important in evolving environments. Hence, from the perspective of practical applications, continual learning algorithms should be stable, i.e., the performance will not greatly drop when new unknown data increases, and scalable, i.e., the memory footprint will not explode after an amount of novel data has arrived. There remains a long way to go before a stable and scalable CL system becomes available.

4. Challenge: Data Secrecy

Industrial data usually do not contain personal information that would call for privacy, yet sensitive business information or even trade secrets, such as production process or supply chain details, often still call for stringent data secrecy.
Due to price and convenience considerations, training and inference are usually also performed in a centralized way in the cloud—either public cloud services or third-party servers. Thus, it is possible for the data and deep learning models to be revealed during transmission, sharing, and processing. The data can be revealed directly during transmission and processing and indirectly through model inversion, in which the training data can be inferred from the model stored in the cloud. Hence, privacy-preserving machine learning techniques have been proposed in order to protect data after it leaves the hands of the owners. Direct data reveal can be prevented through elimination approaches, encryption techniques, and data sanitation, i.e., differential privacy, whereas indirect data reveal is usually reduced through gradient noising. In addition to eliminating, encrypting or sanitizing raw data, new deep learning architectures have been proposed to prevent data leakage during training or inference. For this, the most representative approach is federated learning. However, it should be emphasized that the privacy-preserving methods can be applied to decentralized systems as well in order to reduce the potential risks of data revealing. In this section, the main techniques for privacy-preserving machine learning (PPML) will be introduced, including elimination, data encryption, differential privacy and deep model architectural approaches. Some of these techniques can be used either for protecting raw data or to prevent data revealing during training and inference, as summarized in Table 3. For a more comprehensive review on security for smart manufacturing, refer to [103].

4.1. Elimination Approaches

The straightforward solution for privacy protection is to directly eliminate the privacy-sensitive information from raw data. The traditional commonly-used anonymization techniques are K-anonymity [104], l-diversity [105], and t-closeness [106]. To achieve k-anonymity, any individual in the datasets should be indistinguishable from at least k 1 other individuals. There are two methods for K-anonymity, i.e., suppression and generalization. Suppression removes values of some entries in datasets and generalization groups the entries into categories. Meden et al. apply a generalization mechanism to face deidentification, in which the individual faces are replaced using the average face from the same class [107]. L-diversity is an extension of k-anonymity and aims to ensure that there are at least L distinct values for each attribute in one equivalent class. The two main approaches for l-diversity are suppression and generalization. There are two other versions of l-diversity, i.e., entropy l-diversity and recursive l-diversity (see [105]). The main ideas are to reduce the granularity of a data representation. T-closeness is an extension of l-diversity in which the distance between the distribution of sensitive attributes in the equivalent class and the distribution of the attribute is larger than the threshold. Therefore, the attribute values of each equivalent class cannot offer more than the global table.

4.2. Cryptographic Approaches

One direct approach to protect the data secrecy is to transfer the encrypted data between the machines. Encryption can be applied to protect data secrecy for both raw data and model training and inference. We introduce the two most common and representative encryption approaches here, i.e., homomorphic encryption and functional encryption. However, it has to be emphasized that data encryption approaches are time-consuming, and the communication cost is high.

4.2.1. Homomorphic Encryption

In order to operate on the encrypted data without decryption, the encryption methods mostly rely on homomorphic encryption (HE), in which the data are processed in the ciphertext domain, i.e., with encryption, and the decrypted results are equal to the directly computed results in plaintext domain, i.e., without encryption. This property is named consistency. For more details on HE, see [108].
The raw data are firstly encrypted by the data owners before texting, and the machine learning models can then operate on the encrypted data. The encrypted inference results or trained models are then transmitted back to the data owners and decrypted locally. Therefore, only the data owners have access to the raw data. From the computational point of view, neural network training and inference can be mainly decomposed into addition, multiplication and comparison operations. The ideas for designing the encryption approaches for DNNs are mainly to apply encryption strategies to these operations. Zhang et al. propose back-propagation methods based on HE and gained comparable accuracy to training in a plaintext domain [109]. Often, addition and multiplication consistencies are preserved, and the activation functions are approximated for efficiency. Such function approximation brings errors.
Microsoft provides a cloud service, allowing the users to conduct neural computing in the ciphertext domain [110]. Microsoft Azur allows the users to upload the encrypted data to the cloud and run machine learning algorithms directly on the encrypted data [111]. Amazon SageMaker from Amazon Web Service (AWS) provides similar services, as introduced in [112]. Zhu et al. propose a machine learning training framework that supports homomorphic encryption in [113].

4.2.2. Functional Encryption

Unlike homomorphic encryption, in which the model outputs are in the ciphertext domain, with functional encryption (FE), the outputs can be directly accessible in the plaintext domain [114]. We compare this scheme with HE in Figure 11. However, compared with HE, FE is still in the infancy stage and has not been widely applied.
Functional encryption schemes are designed for the key holders to calculate a particular function (the functionality) on encrypted data. Two major schemes are usually utilized for deep learning-related calculations, namely inner-product scheme [115] and quadratic scheme [116]. As the names imply, the inner-product scheme allows the inner-product and quadratic calculations, respectively, to occur in the ciphertext domain. Xu et al. propose CryptoNN, in which the inner-product scheme is applied in neural network training [117]. Ryffel et al. apply a quadratic scheme to polynomial neural networks, which is an approximation form of normal neural networks [118], as shown in [114].
Song et al. built the application on the cloud to match the data sellers and potential customers with encryption in [119]. There are currently no well-known cloud services supporting PPML with functional encryption.

4.3. Differential Privacy

Differential Privacy (DP) aims to minimize the chances of identifying individual samples through the respective function outputs using permutation, i.e., imposing noises [120].

4.3.1. Data Noising

DP-based privacy-preserving methods have been applied by Google, Apple, Microsoft, and LinkedIn for user data collection [121]. However, DP-based methods can harm the data utilities, especially with the so-called context-free permutation methods that take all data features indiscriminately for the worst case that the attackers have unlimited auxiliary knowledge of the data [122]. Alternatively, context-aware privacy-preserving methods have been proposed, which only permute the privacy-related features through exploiting the data statistics. Huang et al. propose to learn the privacy mechanisms using GAN with the inputs of raw data and outputs of sanitized data [122]. Nevertheless, these methods are mainly domain-dependent and require data utility tests to ensure the data are almost equally useful for machine learning tasks before and after sanitation.
Microsoft has developed the SmartNoise system, which provides DP modules, including different private machine learning model training, in order to assist the users in configuring customized privacy protection solutions [123]. In the energy consumption prediction application in [124], data are sanitized with DP before being fed into the deep learning models.

4.3.2. Gradient Noising

In addition to noising the raw data introduced above, DNN parameters, such as gradients and model outputs, can also be permuted to prevent data reveal through model inference, especially when the attackers have access to the model architectures and parameters. These permutation methods are mostly based on the differential privacy methods and the assumption that the trained DNN models encode the training data features. The model parameters are, therefore, permutated to reduce the probability of inferring the training data from the models.
Abadi et al. propose to permute the gradients during training using differential privacy (DP-SGD) to reduce the connections between the model parameters and training data [125]. Papernot et al. propose to add noise to the ensemble outputs of the DNNs to obfuscate the effects of individual models on the final output [126]. There are more related works, such as imposing noise on loss functions [127]. Various DL frameworks support DP-SGD, such as Tensorflow Privacy and Huawei Mindspore.

4.4. Federated Learning

Federated learning (FL) was initially proposed to enable information sharing without exchanging raw data between the distributed clients during training [128]. In FL, each client trains a local model and uploads the intermediate parameters, i.e., gradients and weights, to the central server. The central server averages the model parameters and then sends them back to the participating clients for the next iteration.
Frequent parameter exchange can, however, pose high communication burdens to the system that are even greater than from directly sharing raw data. Meanwhile, infrequent parameter exchange can impede model convergence. The strategies are to maintain the balance between communication frequency and model convergence [129]. Nevertheless, the parameter sharing can bring risks to data secrecy as well. With respect to these, Privacy Protection Federated Learning with encryption approaches has been proposed, in which the parameter aggregation and exchanging are in the ciphertext domain [130]. Bagheri et al. propose applying homomorphic encryption to federated learning for machinery prognostics and health management to protect the data secrecy of enterprises [131]. In addition, FL can be combined with differential privacy in which the local model parameters are permutated before exchanging to prevent the adversary clients from inferring local data of other clients [132]. Zhou et al. propose a data processing framework for multiple robots based on federated learning [133]. In their work, the data generated by each robot is permuted using differential privacy approaches. Their experiments demonstrate that the implementation can be applied to multiple robotics while balancing privacy and accuracy.
The above-introduced is conventional federated learning in which each client holds the whole model of individual versions locally, and it is called vertical federated as well. In horizontal federated learning, which is also known as split learning, each client or server owns a portion of the DNN layers, and they cooperate together for training and inference [134]. Therefore, only intermediate layer outputs (cut layer) are transferred between the clients and servers instead of raw data. Compared with vertical FL, horizontal FL is still at a very early stage. Vepakomma et al. firstly proposed split learning for medical image processing with CNNs [134]. Afterward, Abuadbba et al. applied split learning to one-dimensional CNNs and imposed DP noise on the cut layer activation functions to protect data privacy [135].

5. Challenge: DNN Reliability

Safe deep learning (safeDL) is an active, emerging topic from both scientific and applied communities that aims for more reliable and robust DL systems. The respective algorithms are currently mainly applied in healthcare and automotive domains. However, they are still vital for smart manufacturing applications, as incorrect results can result in production loss, machine damage, or even human casualties. We will focus on the DNN reliability by introducing its main algorithmic directories in this section. For a more comprehensive overview, see [136].
Deep learning reliability depends on multiple aspects, yet the most important ones are improper model architecture selection and data shift [136]. Neural architecture selection is highly dependent on domain experience and data and will not be discussed here. However, neural architecture search (NAS) is a fast-developing domain in the machine learning community [137]. There are currently no applications in smart manufacturing. Data shift refers to testing data that are no longer stationary with the training data, i.e., they are not under the same statistical distributions. Data shift can result from biased training data selection and dynamic environments. These issues are highly common in industrial applications as well. Dreyfus et al. propose a framework for the maintenance of machine learning models, including concept drift detection and concept drift handling in smart manufacturing use cases [138]

5.1. Concept Drift Detection

For streaming data, i.e., samples generated continuously in time order, concept drift refers to the phenomenon that the statistical properties of the data change over time. For instance, the machinery vibration intensity may attenuate due to aging, and the DNN model trained using old data for predictive maintenance may, therefore, fail to detect the anomalies.
Concept drift can usually be detected through model prediction errors, data distribution changes, or a hybrid of both [139]. The first class of methods is based on the assumption that the model accuracy drops significantly when concept drift happens. These methods are compatible with various ML models, including neural networks. The second class compares the statistical similarities of the old and new data. The distributions of the old and new data will be derived for comparison [139]. However, the performances of the statistical methods degrade when the data are of high dimension. Kabir et al. detect the concept drift using model prediction accuracy for software defect prediction in [140]. Zenisek et al. model the statistical distributions of the old data using multiple machine leaning methods for concept drift detection in predictive maintenance in [141]. They assume that the machinery errors can be inferred through concept drifts. In [142], Barmeo et al. utilize the model accuracy to detect data drift in order to adapt the model to the evolving data because of the natural degradation of machinery for energy prediction modeling applications. Compared to the methods without concept drift detection, their system can double the fit rate of the energy estimation.

5.2. Uncertainty Estimation

For safety-critical applications, it is ideal for deep learning models to output the predictions altogether with the confidence of the model, i.e., uncertainties. Uncertainties can be caused by non-stationary training and testing data, i.e., testing data sampled from different statistical distributions from the training data, such as concept drift, as well as the intrinsic, inevitable randomness of the data. These two situations correspond to epistemic and aleatory uncertainty, respectively [143]. For DNNs, the uncertainty estimation methods fall into three categories: Bayesian neural networks (BNNs), ensembles, and uncertainty prediction. We introduce each of them in the following text.
In BNNs, the model parameters, i.e., weights and biases, are valued by distributions, e.g., Gaussian, instead of fixed numbers. Multiple inference runs are performed on the same data. In each run, the model parameters are sampled from these distributions. The variance of the results after a number of inferences have been run on the same data sample with different parameter samples can tell the uncertainty [143,144]. Benker et al. apply BNN to machinery for useful life estimation with sensory time series measurements and qualify the corresponding prediction uncertainty [145]. The estimated uncertainty can be utilized to calibrate the prediction and improve accuracy. In ensembles, a series of models are trained on the same data with different initialization. Similarly, the variances of their prediction disagreements indicate the uncertainties [146]. Nemani et al. predict the uncertainty-aware remaining life for bearings using ensembles of LSTMs in [147]. After testing on both open and private real-world datasets, they show their method is more superior than BNN methods in both prediction accuracy and uncertainty qualification. Lastly, DeVries et al. propose to output the confidence of the predictions directly through training [148]. The models are trained by enforcing the prediction probabilities close to the label distributions (one-hot encoding of the labels).

5.3. Out-of-Distribution Detection

Out of Distribution Detection (ODD) is used to search to test samples that have different statistical distributions than the training samples, i.e., the unknowns. ODD samples are under the scope of data drift, and if it is specifically for classification tasks, i.e., new class samples, the problem is named open set recognition (OSR).
One branch of ODD methods is uncertainty-based methods, as introduced in Section 5.2. However, these methods hardly reach high accuracy. Moreover, uncertainties are usually mixed by epistemic and aleatory uncertainty, while ODD samples cause epistemic uncertainty only. Another branch of the ODD methods exploits the intermediate and final layer outputs of the original DNNs and is more computationally efficient. Hendrycks et al. firstly propose the baseline method utilizing the softmax distributions, i.e., prediction probability, the maximum neural outputs before the final activation functions, based on the observation that the out-of-distribution samples have lower prediction probabilities than in-distribution samples [149]. Lee et al. propose comparing the intermediate layer outputs of the training and testing data for the detection of out-of-distribution samples [150]. Specifically for OSR, Bendale et al. propose the distance-based OSR method based on the assumption that the distribution data from the same class should be geometrically close to each other [151]. Xu et al. apply a distance-based OSR method to machinery fault diagnosis considering the scenario when unknown fault types appear during inference [152]. With the invitation of OSR, the classification accuracy is greatly improved when there exist out-of-distribution samples.

6. Conclusions Trends

This paper identifies the main common obstacles of industrial AI for smart manufacturing as well as their respective solutions. The identified challenges are data quality, data secrecy and model reliability. Several possible solutions from transfer learning to DNN encryption and uncertainty estimation have been presented and summarized, alongside their smart manufacturing applications. Table 4 lists the applications using these algorithms in smart manufacturing. Part of these algorithms are at the infancy stage and have not been applied in industry yet, although many of them are currently in the spotlight, such as training permutation and out-of-distribution detection.
With increasingly deeper research on the solutions for industry 4.0, more problems with the AI methods in smart manufacturing will emerge and be identified. We can foresee some of them here for discussion. The obstacles discussed in this paper usually can not appear individually, which adds complexity to the solutions. For instance, domain adaption is usually applied altogether with semi-supervised learning since data labeling in practical run time can sometimes be impossible [153]. The algorithms in Section 3 have not paid much attention to data secrecy, such as privacy-preserving transfer learning [154]. These issues can be obstacles to building robust industrial deep learning applications that have not been addressed yet.
Furthermore, related to the three challenges discussed in this paper, data sharing [155,156] and adaptive AI [157] have been proposed in the last two years. Data sharing is a straightforward solution for data scarcity in that the entities can directly learn from the data from their peers in the community. According to [155], more than 70% of global data and analytics decision-makers are expanding their ability to use external data. However, as pointed out in [156], the data sharing in industry encounters the challenges of data secrecy, lack of common ontology, and platforms and services. Data secrecy can be stressed with the approaches introduced in Section 4 so that confidential business information can not be derived externally. A common data ontology aims to ensure data from all entities in the community can be interpreted and merged with each other. Internal and external hardware facilities, such as edge, fog and cloud computing, are needed for collecting, storing and sharing data [158].
As discussed in Section 3.5 and Section 5, data can evolve with time so that retraining is necessary to secure the ML model performance. Adaptive AI enables the model to be self-updated to meet the evolving of datasets. From our point of view, following technical problems should be addressed for adaptive AI: (1) how to improve retraining accuracy and speed up retraining; (2) how to schedule manufacturing to well maintain retraining and normal inference; (3) which hardware architecture is suitable for adaptive AI? For (1), deep continual learning can still not achieve comparable accuracy with vanilla retraining when history data and retraining time are limited. Methods should be proposed to balance accuracy, history data volume and retraining time so that a hybrid retraining strategy can be in action to secure the performance. For (2), one question should be considered—whether the manufacturing process should wait for the retraining or if there is a smarter way to make progress in both simultaneously; For (3), in order to maintain the optimized solution in (1) and (2), economic budgets, data secrecy and individual hardware conditions of the manufactures, computing architectures should be carefully selected.
Last but not least, building an AI system in smart manufacturing with high reliability not only requires effort from the machine learning algorithm developers but also a collaboration between domain experts, facility engineers, and third-party organizations to coordinate the collaborations between the entities.
Table 4. Summary of deep learning algorithms and their industrial applications.
Table 4. Summary of deep learning algorithms and their industrial applications.
Application Domains
Supply Chain
Data Augmentation[64][63,66][159][160][161]
Semi-supervised Learning[5,6,72][8][11][15][162]
Data QualityActive Learning[82,163][78][164]
Transfer Learning[9,10][165,166][12][167][168]
Continual Learning[169,170][171][13][14][172]
Cryptographic Approaches[131,173][174,175][176]
Data SecrecyDifferential Privacy[173][133][177]
Federated Learning[178][131,179][133,180][181]
Concept Drift Detection[140,182][141,142][183]
DNN ReliabilityUncertainty Estimation[184,185][78,144,145,147][186,187][188,189][190]
Out of Distribution Detection[152]


This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest..


  1. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the Game of Go with Deep neural Networks and Tree Search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
  2. Google. Cloud Vision API. Available online: (accessed on 18 July 2022).
  3. Research, D. AI Enablement on the Way to Smart Manufacturing: Deloitte Survey on AI Adoption in Manufacturing; Technical Report; Deloitte: London, UK, 2020. [Google Scholar]
  4. Ellström, M.; Erwin, T.; Ringland, K.; Lulla, S. AI in European Manufacturing Industries 2020; Technical Report; Microsoft: Redmond, WA, USA, 2020. [Google Scholar]
  5. Zheng, X.; Wang, H.; Chen, J.; Kong, Y.; Zheng, S. A Generic Semi-Supervised Deep Learning-Based Approach for Automated Surface Inspection. IEEE Access 2020, 8, 114088–114099. [Google Scholar] [CrossRef]
  6. Di, H.; Ke, X.; Peng, Z.; Zhou, D. Surface Defect Classification of Steels with a New Semi-Supervised Learning Method. Opt. Lasers Eng. 2019, 117, 40–48. [Google Scholar] [CrossRef]
  7. Zhao, R.; Yan, R.; Wang, J.; Mao, K. Learning to Monitor Machine Health with Convolutional Bi-Directional LSTM Networks. Sensors 2017, 17, 273. [Google Scholar] [CrossRef] [PubMed]
  8. Yoon, A.S.; Lee, T.; Lim, Y.; Jung, D.; Kang, P.; Kim, D.; Park, K.; Choi, Y. Semi-Supervised Learning with Deep Generative Models for Asset Failure Prediction. arXiv 2017, arXiv:1709.00845. [Google Scholar]
  9. Mittel, D.; Kerber, F. Vision-Based Crack Detection using Transfer Learning in Metal Forming Processes. In Proceedings of the 24th International Conference on Emerging Technologies and Factory Automation (ETFA’19), Zaragoza, Spain, 10–13 September 2019; pp. 544–551. [Google Scholar]
  10. Alencastre-Miranda, M.; Johnson, R.R.; Krebs, H.I. Convolutional Neural Networks and Transfer Learning for Quality Inspection of Different Sugarcane Varieties. IEEE Trans. Ind. Inform. 2020, 17, 787–794. [Google Scholar] [CrossRef]
  11. Kong, Y.; Ni, D. A Semi-Supervised and Incremental Modeling Framework for Wafer Map Classification. IEEE Trans. Semicond. Manuf. 2020, 33, 62–71. [Google Scholar] [CrossRef]
  12. Imoto, K.; Nakai, T.; Ike, T.; Haruki, K.; Sato, Y. A CNN-Based Transfer Learning Method for Defect Classification in Semiconductor Manufacturing. In Proceedings of the International Symposium on Semiconductor Manufacturing (ISSM’18), Tokyo, Japan, 10–11 December 2018; pp. 1–3. [Google Scholar]
  13. Zhang, X.; Zou, Y.; Li, S. Enhancing Incremental Deep Learning for FCCU End-Point Quality Prediction. Inf. Sci. 2020, 530, 95–107. [Google Scholar] [CrossRef]
  14. Alambeigi, F.; Wang, Z.; Hegeman, R.; Liu, Y.H.; Armand, M. A Robust Data-Driven Approach for Online Learning and Manipulation of Unmodeled 3-D Heterogeneous Compliant Objects. IEEE Trans. Robot. Autom. 2018, 3, 4140–4147. [Google Scholar] [CrossRef]
  15. Bynum, J. A Semi-Supervised Machine Learning Approach for Acoustic Monitoring of Robotic Manufacturing Facilities. Ph.D. Dissertation, George Mason University, Fairfax, VA, USA, 2020. [Google Scholar]
  16. Mrazovic, P.; Larriba-Pey, J.L.; Matskin, M. A Deep Learning Approach for Estimating Inventory Rebalancing Demand in Bicycle Sharing Systems. In Proceedings of the 42nd Annual Computer Software and Applications Conference (COMPSAC’18), Tokyo, Japan, 23–27 July 2018; Volume 2, pp. 110–115. [Google Scholar]
  17. Cavalcante, I.M.; Frazzon, E.M.; Forcellini, F.A.; Ivanov, D. A Supervised Machine Learning Approach to Data-Driven Simulation of Resilient Supplier Selection in Digital Manufacturing. Int. J. Inf. Manag. 2019, 49, 86–97. [Google Scholar] [CrossRef]
  18. Li, Y.; Chu, F.; Feng, C.; Chu, C.; Zhou, M. Integrated Production Inventory Routing Planning for Intelligent Food Logistics Systems. IEEE Trans. Intell. Transp. Syst. 2018, 20, 867–878. [Google Scholar] [CrossRef]
  19. Woschank, M.; Rauch, E.; Zsifkovits, H. A Review of Further Directions for Artificial Intelligence, Machine Learning, and Deep Learning in Smart Logistics. Sustainability 2020, 12, 3760. [Google Scholar] [CrossRef]
  20. Ran, Y.; Zhou, X.; Lin, P.; Wen, Y.; Deng, R. A Survey of Predictive Maintenance: Systems, Purposes and Approaches. arXiv 2019, arXiv:1912.07383. [Google Scholar]
  21. Lwakatare, L.E.; Raj, A.; Crnkovic, I.; Bosch, J.; Olsson, H.H. Large-scale Machine Learning Systems in Real-World Industrial Settings: A Review of Challenges and Solutions. Inf. Softw. Technol. 2020, 127, 106368. [Google Scholar] [CrossRef]
  22. Hernavs, J.; Ficko, M.; Berus, L.; Rudolf, R.; Klančnik, S. Deep Learning in Industry 4.0—Brief Overview. J. Prod. Eng 2018, 21, 1–5. [Google Scholar] [CrossRef]
  23. Wang, J.; Ma, Y.; Zhang, L.; Gao, R.X.; Wu, D. Deep Learning for Smart Manufacturing: Methods and Applications. J. Manuf. Syst. 2018, 48, 144–156. [Google Scholar] [CrossRef]
  24. Shang, C.; You, F. Data Analytics and Machine Learning for Smart Process Manufacturing: Recent Advances and Perspectives in the Big Data Era. Engineering 2019, 5, 1010–1016. [Google Scholar] [CrossRef]
  25. Bertolini, M.; Mezzogori, D.; Neroni, M.; Zammori, F. Machine Learning for Industrial Applications: A Comprehensive Literature Review. Expert Syst. Appl. 2021, 175, 114820. [Google Scholar] [CrossRef]
  26. Dogan, A.; Birant, D. Machine Learning and Data Mining in Manufacturing. Expert Syst. Appl. 2021, 166, 114060. [Google Scholar] [CrossRef]
  27. Sharp, M.; Ak, R.; Hedberg, T., Jr. A Survey of the Advancing Use and Development of Machine Learning in Smart Manufacturing. J. Manuf. Syst. 2018, 48, 170–179. [Google Scholar] [CrossRef]
  28. Kim, D.H.; Kim, T.J.; Wang, X.; Kim, M.; Quan, Y.J.; Oh, J.W.; Min, S.H.; Kim, H.; Bhandari, B.; Yang, I.; et al. Smart Machining Process using Machine Learning: A Review and Perspective on Machining Industry. Int. J. Precis. Eng. Manuf.-Green Technol. 2018, 5, 555–568. [Google Scholar] [CrossRef]
  29. Çınar, Z.M.; Abdussalam Nuhu, A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine Learning in Predictive Maintenance towards Sustainable Smart Manufacturing in Industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
  30. Nasir, V.; Sassani, F. A Review on Deep Learning in Machining and Tool Monitoring: Methods, Opportunities, and Challenges. Int. J. Adv. Manuf. Technol. 2021, 115, 2683–2709. [Google Scholar] [CrossRef]
  31. Weichert, D.; Link, P.; Stoll, A.; Rüping, S.; Ihlenfeldt, S.; Wrobel, S. A Review of Machine Learning for the Optimization of Production Processes. Int. J. Adv. Manuf. Technol. 2019, 104, 1889–1902. [Google Scholar] [CrossRef]
  32. Wang, C.; Tan, X.; Tor, S.; Lim, C. Machine learning in additive manufacturing: State-of-the-art and perspectives. Addit. Manuf. 2020, 36, 101538. [Google Scholar] [CrossRef]
  33. Yang, J.; Li, S.; Wang, Z.; Dong, H.; Wang, J.; Tang, S. Using Deep Learning to Detect Defects in Manufacturing: A Comprehensive Survey and Current Challenges. Materials 2020, 13, 5755. [Google Scholar] [CrossRef]
  34. Kotsiopoulos, T.; Sarigiannidis, P.; Ioannidis, D.; Tzovaras, D. Machine Learning and Deep Learning in Smart Manufacturing: The Smart Grid Paradigm. Comput. Sci. Rev. 2021, 40, 100341. [Google Scholar] [CrossRef]
  35. Wang, F.; Zhang, M.; Wang, X.; Ma, X.; Liu, J. Deep Learning for Edge Computing Applications: A State-of-the-art Survey. IEEE Access 2020, 8, 58322–58336. [Google Scholar] [CrossRef]
  36. Ma, X.; Yao, T.; Hu, M.; Dong, Y.; Liu, W.; Wang, F.; Liu, J. A Survey on Deep Learning Empowered IoT Applications. IEEE Access 2019, 7, 181721–181732. [Google Scholar] [CrossRef]
  37. Ng, A. Machine Learning. Available online: (accessed on 8 August 2021).
  38. Nwankpa, C.; Ijomah, W.; Gachagan, A.; Marshall, S. Activation Functions: Comparison of Trends in Practice and Research for Deep Learning. arXiv 2018, arXiv:1811.03378. [Google Scholar]
  39. Chen, T.; Chen, H.; Liu, R.w. Approximation Capability in C (R/sup n/) by Multilayer Feedforward Networks and Related Problems. IEEE Trans. Neural Netw. 1995, 6, 25–30. [Google Scholar] [CrossRef] [PubMed]
  40. Deng, L. A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning. APSIPA Trans. Signal Inf. Process. 2014, 3. [Google Scholar] [CrossRef]
  41. Bengio, Y. Learning Deep Architectures for AI; Now Publishers Inc.: Delft, The Netherlands, 2009. [Google Scholar]
  42. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS’12), Lake Tahoe, NV, USA, 5–10 December 2013; pp. 1097–1105. [Google Scholar]
  43. Lipton, Z.C.; Berkowitz, J.; Elkan, C. A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv arXiv:1506.00019, 2015.
  44. Nanduri, A.; Sherry, L. Anomaly Detection in Aircraft Data using Recurrent Neural Networks (RNN). In Proceedings of the 12th International Conference on Networking and Services (ICNS’16), Lisbon, Portugal, 26–30 June 2016; p. 5C2-1. [Google Scholar]
  45. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 1st International Conference on Learning Representations (ICLR’13), Scottsdale, AZ, USA, 2–4 May 2013; OpenReview: Scottsdale, AZ, USA, 2013. [Google Scholar]
  46. Dzmitry, B.; Kyunghyun, C.; Yoshua, B. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), San Diego, CA, USA, 7–9 May 2015; OpenReview: San Diego, CA, USA, 2015. [Google Scholar]
  47. Li, Y.; Yang, Y.; Zhu, K.; Zhang, J. Clothing Sale Forecasting by a Composite GRU–Prophet Model With an Attention Mechanism. IEEE Trans. Ind. Inform. 2021, 17, 8335–8344. [Google Scholar] [CrossRef]
  48. Wang, Y.; Deng, L.; Zheng, L.; Gao, R.X. Temporal Convolutional Network with Soft Thresholding and Attention Mechanism for Machinery Prognostics. J. Manuf. Syst. 2021, 60, 512–526. [Google Scholar] [CrossRef]
  49. van Veen, F. The Neural Network Zoo. Available online: (accessed on 10 August 2021).
  50. Kadar, M.; Onita, D. A deep CNN for image analytics in automated manufacturing process control. In Proceedings of the 2019 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI’19), Pitesti, Romania, 27–29 June 2019. [Google Scholar]
  51. Wei, P.; Liu, C.; Liu, M.; Gao, Y.; Liu, H. CNN-based reference comparison method for classifying bare PCB defects. J. Eng. 2018, 2018, 1528–1533. [Google Scholar] [CrossRef]
  52. Curreri, F.; Patanè, L.; Xibilia, M.G. RNN-and LSTM-based soft sensors transferability for an industrial process. Sensors 2021, 21, 823. [Google Scholar] [CrossRef]
  53. Khan, A.H.; Li, S.; Luo, X. Obstacle avoidance and tracking control of redundant robotic manipulator: An RNN-based metaheuristic approach. IEEE Trans. Ind. Inform. 2019, 16, 4670–4680. [Google Scholar] [CrossRef]
  54. Nguyen, H.; Tran, K.P.; Thomassey, S.; Hamad, M. Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. Int. J. Inf. Manag. 2021, 57, 102282. [Google Scholar] [CrossRef]
  55. Yun, H.; Kim, H.; Jeong, Y.H.; Jun, M.B. Autoencoder-based anomaly detection of industrial robot arm using stethoscope based internal sound sensor. J. Intell. Manuf. 2021, 1–18. [Google Scholar] [CrossRef]
  56. Tsai, D.M.; Jen, P.H. Autoencoder-based anomaly detection for surface defect inspection. Adv. Eng. Inform. 2021, 48, 101272. [Google Scholar] [CrossRef]
  57. Liu, C.; Tang, D.; Zhu, H.; Nie, Q. A novel predictive maintenance method based on deep adversarial learning in the intelligent manufacturing system. IEEE Access 2021, 9, 49557–49575. [Google Scholar] [CrossRef]
  58. Wu, Y.; Dai, H.N.; Tang, H. Graph neural networks for anomaly detection in industrial internet of things. IEEE Internet Things J. 2021, 9, 9214–9231. [Google Scholar] [CrossRef]
  59. Cooper, C.; Zhang, J.; Gao, R.X.; Wang, P.; Ragai, I. Anomaly detection in milling tools using acoustic signals and generative adversarial networks. Procedia Manuf. 2020, 48, 372–378. [Google Scholar] [CrossRef]
  60. Liu, H.; Liu, Z.; Jia, W.; Lin, X.; Zhang, S. A novel transformer-based neural network model for tool wear estimation. Meas. Sci. Technol. 2020, 31, 065106. [Google Scholar] [CrossRef]
  61. Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  62. Arcidiacono, C.S. An Empirical Study on Synthetic Image Generation Techniques for Object Detectors. 2018. Available online: (accessed on 10 August 2021).
  63. Lessmeier, C.; Kimotho, J.; Zimmer, D.; Sextro, W. Condition Monitoring of Bearing Damage in Electromechanical Drive Systems by Using Motor Current Signals of Electric Motors: A Benchmark Data Set for Data-Driven Classification. Available online: (accessed on 10 August 2021).
  64. Cui, W.; Zhang, Y.; Zhang, X.; Li, L.; Liou, F. Metal Additive Manufacturing Parts Inspection Using Convolutional Neural Network. Appl. Sci. 2020, 10, 545. [Google Scholar] [CrossRef]
  65. Gao, J.; Song, X.; Wen, Q.; Wang, P.; Sun, L.; Xu, H. RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural Networks. arXiv 2020, arXiv:2002.09545. [Google Scholar]
  66. Gao, X.; Deng, F.; Yue, X. Data Augmentation in Fault Diagnosis based on the Wasserstein Generative Adversarial Network with Gradient Penalty. Neurocomputing 2019, 396, 487–494. [Google Scholar] [CrossRef]
  67. Bird, J.J.; Faria, D.R.; Ekárt, A.; Ayrosa, P.P. From Simulation to Reality: CNN Transfer Learning for Scene Classification. In Proceedings of the 10th International Conference on Intelligent Systems (IS’18), Varna, Bulgaria, 28–30 August 2020. [Google Scholar]
  68. Chapelle, O.; Scholkopf, B.; Zien, A. Semi-Supervised Learning. IEEE Trans. Neural Netw. 2009, 20, 542. [Google Scholar] [CrossRef]
  69. Lee, D.H. Pseudo-label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. In Proceedings of the Workshop on Challenges in Representation Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013; Volume 3. [Google Scholar]
  70. Rajani, C.; Klami, A.; Salmi, A.; Rauhala, T.; Hæggström, E.; Myllymäki, P. Detecting Industrial Fouling by Monotonicity During Ultrasonic Cleaning. In Proceedings of the International Workshop on Machine Learning for Signal Processing (MLSP’18), Aalborg, Denmark, 17–20 September 2018; pp. 1–6. [Google Scholar]
  71. Kingma, D.P.; Mohamed, S.; Rezende, D.J.; Welling, M. Semi-Supervised Learning with Deep Generative Models. In Proceedings of the Advances in Neural Information Processing Systems (NIPS’14), Montréal, QC, Canada, 8–13 December 2014; pp. 3581–3589. [Google Scholar]
  72. Zhang, S.; Ye, F.; Wang, B.; Habetler, T.G. Semi-Supervised Learning of Bearing Anomaly Detection via Deep Variational Autoencoders. arXiv 2019, arXiv:1912.01096. [Google Scholar]
  73. Yang, Z.; Cohen, W.; Salakhudinov, R. Revisiting Semi-Supervised Learning with Graph Embeddings. In Proceedings of the International Conference on Machine Learning (ICML’16), New York, NY, USA, 19–24 June 2016; pp. 40–48. [Google Scholar]
  74. Chen, C.; Liu, Y.; Kumar, M.; Qin, J.; Ren, Y. Energy Consumption Modelling using Deep Learning Embedded Semi-Supervised Learning. Comput. Ind. Eng. 2019, 135, 757–765. [Google Scholar] [CrossRef]
  75. Jospin, L.V.; Buntine, W.L.; Boussaid, F.; Laga, H.; Bennamoun, M. Hands-on Bayesian Neural Networks—A Tutorial for Deep Learning Users. arXiv 2018, arXiv:2007.06823. [Google Scholar] [CrossRef]
  76. Houlsby, N.; Huszár, F.; Ghahramani, Z.; Lengyel, M. Bayesian active learning for classification and preference learning. arXiv 2018, arXiv:1112.5745. [Google Scholar]
  77. Kirsch, A.; Van Amersfoort, J.; Gal, Y. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. In Proceedings of the 33rd Conference Neural Information Processing Systems (NIPS’19), Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 7026–7037. [Google Scholar]
  78. Martínez-Arellano, G.; Ratchev, S. Towards An Active Learning Approach To Tool Condition Monitoring With Bayesian Deep Learning. In Proceedings of the ECMS 2019, Caserta, Italy, 11–14 June 2019; pp. 223–229. [Google Scholar]
  79. Geifman, Y.; El-Yaniv, R. Deep Active Learning over the Long Tail. arXiv 2017, arXiv:1711.00941. [Google Scholar]
  80. Sener, O.; Savarese, S. Active Learning for Convolutional Neural Networks: A Core-Set Approach. In Proceedings of the International Conference Neural Information Processing Systems (ICLR’18), Siem Reap, Cambodia, 13–16 December 2018; ACM: Vancouver, BC, Canada, 2018. [Google Scholar]
  81. Chen, J.; Zhou, D.; Guo, Z.; Lin, J.; Lyu, C.; Lu, C. An Active Learning Method based on Uncertainty and Complexity for Gearbox Fault Diagnosis. IEEE Access 2019, 7, 9022–9031. [Google Scholar] [CrossRef]
  82. Xu, Z.; Liu, J.; Luo, X.; Zhang, T. Cross-Version Defect Prediction via Hybrid Active Learning with Kernel Principal Component Analysis. In Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering (SANER’18), Campobasso, Italy, 20–23 March 2018. [Google Scholar]
  83. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
  84. Perdana, R.S.; Ishida, Y. Instance-Based Deep Transfer Learning on Cross-domain Image Captioning. In Proceedings of the International Electronics Symposium (IES’2019), Surabaya, Indonesia, 27–28 September 2019; pp. 24–30. [Google Scholar]
  85. Wang, B.; Qiu, M.; Wang, X.; Li, Y.; Gong, Y.; Zeng, X.; Huang, J.; Zheng, B.; Cai, D.; Zhou, J. A Minmax Game for Instance-based Selective Transfer Learning. In Proceedings of the 25th International Conference on Knowledge Discovery and Data Mining (KDD’19), Anchorage, AK, USA, 4–8 August 2019; pp. 34–43. [Google Scholar]
  86. Zhang, L.; Guo, L.; Gao, H.; Dong, D.; Fu, G.; Hong, X. Instance-based Ensemble Deep Transfer Learning Network: A New Intelligent Degradation Recognition Method and Its Application on Ball Screw. Mech. Syst. Signal Process. 2020, 140, 106681. [Google Scholar] [CrossRef]
  87. Long, M.; Wang, J. Learning Multiple Tasks with Deep Relationship Networks. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 1593–1602. [Google Scholar]
  88. Zhu, J.; Chen, N.; Shen, C. A New Deep Transfer Learning Method for Bearing Fault Diagnosis under Different Working Conditions. IEEE Sens. J. 2019, 20, 8394–8402. [Google Scholar] [CrossRef]
  89. Kim, S.; Kim, W.; Noh, Y.K.; Park, F.C. Transfer Learning for Automated Optical Inspection. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’17), Anchorage, AK, USA, 14–19 May 2017; pp. 2517–2524. [Google Scholar]
  90. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: (accessed on 20 September 2021).
  91. Weiaicunzai. Awesome—Image Classification. Available online: (accessed on 20 September 2021).
  92. Parisi, G.I.; Kemker, R.; Part, J.L.; Kanan, C.; Wermter, S. Continual Lifelong Learning with Neural Networks: A Review. Neural Netw. 2019, 113, 54–71. [Google Scholar] [CrossRef]
  93. Zenke, F.; Poole, B.; Ganguli, S. Continual Learning through Synaptic Intelligence. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Sydney, Australia, 6–11 August 2017; pp. 3987–3995. [Google Scholar]
  94. Li, Z.; Hoiem, D. Learning without Forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 2935–2947. [Google Scholar] [CrossRef] [PubMed]
  95. Li, D.; Tasci, S.; Ghosh, S.; Zhu, J.; Zhang, J.; Heck, L. RILOD: Near Real-Time Incremental Learning for Object Detection at the Edge. In Proceedings of the 4th Symposium on Edge Computing (SEC’19), Arlington, VA, USA, 7–9 November 2019; pp. 113–126. [Google Scholar]
  96. De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; Tuytelaars, T. Continual Learning: A Comparative Study on How to Defy Forgetting in Classification Tasks. arXiv 2019, arXiv:1909.08383. [Google Scholar]
  97. Lopez-Paz, D.; Ranzato, M. Gradient Episodic Memory for Continual Learning. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6467–6476. [Google Scholar]
  98. Shin, H.; Lee, J.K.; Kim, J.; Kim, J. Continual Learning with Deep Generative Replay. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 2990–2999. [Google Scholar]
  99. Wiewel, F.; Yang, B. Continual Learning for Anomaly Detection with Variational Autoencoder. In Proceedings of the 44th International Conference on Acoustics, Speech and Signal Processing (ICASSP’19), Brighton, UK, 12–17 May 2019; pp. 3837–3841. [Google Scholar]
  100. Mallya, A.; Lazebnik, S. Packnet: Adding Multiple Tasks to A Single Network by Iterative Pruning. In Proceedings of the 34th Conference on Computer Vision and Pattern Recognition (CVPR’18), Salt Lake City, UT, USA, 18–22 June 2018; pp. 7765–7773. [Google Scholar]
  101. Rusu, A.A.; Rabinowitz, N.C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; Hadsell, R. Progressive Neural Networks. arXiv 2016, arXiv:1606.04671. [Google Scholar]
  102. Castro, F.M.; Marín-Jiménez, M.J.; Guil, N.; Schmid, C.; Alahari, K. End-to-End Incremental Learning. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18), Munich, Germany, 8–14 September 2018; pp. 233–248. [Google Scholar]
  103. Tuptuk, N.; Hailes, S. Security of Smart Manufacturing Systems. J. Manuf. Syst. 2018, 47, 93–106. [Google Scholar] [CrossRef]
  104. Samarati, P.; Sweeney, L. Protecting Privacy when Disclosing Information: K-anonymity and its Enforcement through Generalization and Suppression. In Proceedings of the IEEE Symposium on Research in Security and Privacy (S&P), Oakland, CA, USA, 3–6 May 1998. [Google Scholar]
  105. Machanavajjhala, A.; Kifer, D.; Gehrke, J.; Venkitasubramaniam, M. L-diversity: Privacy beyond K-anonymity. ACM Trans. Knowl. Discov. Data (TKDD) 2007, 1, 3-es. [Google Scholar] [CrossRef]
  106. Li, N.; Li, T.; Venkatasubramanian, S. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering (ICDE’13), Istanbul, Turkey, 15–20 April 2007. [Google Scholar]
  107. Meden, B.; Emeršič, Ž.; Štruc, V.; Peer, P. K-Same-Net: K-Anonymity with Generative Deep Neural Networks for Face Deidentification. Entropy 2018, 20, 60. [Google Scholar] [CrossRef]
  108. Gentry, C. Fully Homomorphic Encryption using Ideal Lattices. In Proceedings of the 41st Annual Symposium on Theory of Computing (STOC’09), Washington, DC, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar]
  109. Zhang, Q.; Yang, L.T.; Chen, Z. Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning. IEEE Trans. Comput. 2015, 65, 1351–1362. [Google Scholar] [CrossRef]
  110. Gilad-Bachrach, R.; Dowlin, N.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. Cryptonets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of the International Conference on Machine Learning (ICML’16), New York, NY, USA, 20–22 June 2016; pp. 201–210. [Google Scholar]
  111. Azure. Enterprise Security and Governance for Azure Machine Learning. Available online: (accessed on 12 November 2021).
  112. Meng, X. Machine Learning Models that Act on Encrypted Data. Available online: (accessed on 12 November 2021).
  113. Zhu, L.; Tang, X.; Shen, M.; Gao, F.; Zhang, J.; Du, X. Privacy-Preserving Machine Learning Training in IoT Aggregation Scenarios. IEEE Internet Things J. 2021, 8, 12106–12118. [Google Scholar] [CrossRef]
  114. Ryffel, T.; Dufour-Sans, E.; Gay, R.; Bach, F.; Pointcheval, D. Partially Encrypted Machine Learning using Functional Encryption. In Proceedings of the 33rd Conference Neural Information Processing Systems (NIPS’19), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
  115. Abdalla, M.; Bourse, F.; Caro, A.D.; Pointcheval, D. Simple Functional Encryption Schemes for Inner Products. In Proceedings of the IACR International Workshop on Public Key Cryptography, Gaithersburg, MD, USA, 30 March–1 April 2015; pp. 733–751. [Google Scholar]
  116. Boneh, D.; Franklin, M. Identity-based Encryption from the Weil Pairing. In Proceedings of the Annual International Cryptology Conference (CRYPTO’01), Santa Barbara, CA, USA, 19–23 August 2001; pp. 213–229. [Google Scholar]
  117. Xu, R.; Joshi, J.B.; Li, C. Cryptonn: Training Neural Networks over Encrypted Data. In Proceedings of the 39th International Conference on Distributed Computing Systems (ICDCS’19), Dallas, TX, USA, 7–9 July 2019; pp. 1199–1209. [Google Scholar]
  118. Chrysos, G.G.; Moschoglou, S.; Bouritsas, G.; Deng, J.; Panagakis, Y.; Zafeiriou, S.P. Deep Polynomial Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4021–4034. [Google Scholar] [CrossRef]
  119. Song, Q.; Cao, J.; Sun, K.; Li, Q.; Xu, K. Try before You Buy: Privacy-preserving Data Evaluation on Cloud-based Machine Learning Data Marketplace. In Proceedings of the Annual Computer Security Applications Conference (ACSAC’21), Austin, TX, USA, 6–10 December 2021; pp. 260–272. [Google Scholar]
  120. Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 3–4. [Google Scholar]
  121. Kenthapadi, K.; Mironov, I.; Thakurta, A.G. Privacy-preserving data mining in industry. In Proceedings of the WSDM 2019, Melbourne, Australia, 11–15 February 2019; pp. 840–841. [Google Scholar]
  122. Huang, C.; Kairouz, P.; Chen, X.; Sankar, L.; Rajagopal, R. Context-aware generative adversarial privacy. Entropy 2017, 19, 656. [Google Scholar] [CrossRef]
  123. Kopp, A. Microsoft Smartnoise Differential Privacy Machine Learning Case Studies; Microsoft Azure White Papers; Microsoft Corporation: Redmond, WA, USA, 2021. [Google Scholar]
  124. Wu, J.; Dang, Y.; Jia, H.; Liu, X.; Lv, Z. Prediction of Energy Consumption in Digital Twins of Intelligent Factory by Artificial Intelligence. In Proceedings of the 2021 International Conference on Technology and Policy in Energy and Electric Power (ICT-PEP’21), Yogyakarta, Indonesia, 29–30 September 2021; pp. 354–359. [Google Scholar]
  125. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep Learning with Differential Privacy. In Proceedings of the Conference on Computer and Communications Security (SIGSAC’16), Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
  126. Papernot, N.; Song, S.; Mironov, I.; Raghunathan, A.; Talwar, K.; Erlingsson, U. Scalable Private Learning with PATE. In Proceedings of the 6st International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  127. Gong, M.; Xie, Y.; Pan, K.; Feng, K.; Qin, A.K. A Survey on Differentially Private Machine Learning. IEEE Comput. Intell. Mag. 2020, 15, 49–64. [Google Scholar] [CrossRef]
  128. Konečnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
  129. Chen, M.; Yang, Z.; Saad, W.; Yin, C.; Poor, H.V.; Cui, S. A Joint Learning and Communications Framework for Federated Learning over Wireless Networks. arXiv 2020, arXiv:1909.07972. [Google Scholar] [CrossRef]
  130. Hao, M.; Li, H.; Luo, X.; Xu, G.; Yang, H.; Liu, S. Efficient and Privacy-Enhanced Federated Learning for Industrial Artificial Intelligence. IEEE Trans. Ind. Inform. 2019, 16, 6532–6542. [Google Scholar] [CrossRef]
  131. Bagheri, B.; Rezapoor, M.; Lee, J. A Unified Data Security Framework for Federated Prognostics and Health Management in Smart Manufacturing. Manuf. Lett. 2020, 24, 136–139. [Google Scholar] [CrossRef]
  132. Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.H.; Farokhi, F.; Jin, S.; Quek, T.Q.; Poor, H.V. Federated Learning with Differential Privacy: Algorithms and Performance Analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
  133. Zhou, W.; Li, Y.; Chen, S.; Ding, B. Real-Time Data Processing Architecture for Multi-Robots Based on Differential Federated Learning. In Proceedings of the SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI 2018, Guangzhou, China, 8–12 October 2018; pp. 462–471. [Google Scholar]
  134. Vepakomma, P.; Gupta, O.; Swedish, T.; Raskar, R. Split Learning for Health: Distributed Deep Learning without Sharing Raw Patient Data. arXiv 2018, arXiv:1812.00564. [Google Scholar]
  135. Abuadbba, S.; Kim, K.; Kim, M.; Thapa, C.; Camtepe, S.A.; Gao, Y.; Kim, H.; Nepal, S. Can We Use Split Learning on 1D CNN Models for Privacy Preserving Training? arXiv 2020, arXiv:2003.12365. [Google Scholar]
  136. Saria, S.; Subbaswamy, A. Tutorial: Safe and Reliable Machine Learning. arXiv 2019, arXiv:1904.07204. [Google Scholar]
  137. Ren, P.; Xiao, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. (CSUR) 2021, 54, 1–34. [Google Scholar] [CrossRef]
  138. Dreyfus, P.A.; Pélissier, A.; Psarommatis, F.; Kiritsis, D. Data-based Model Maintenance in the Era of Industry 4.0: A Methodology. J. Manuf. Syst. 2022, 63, 304–316. [Google Scholar] [CrossRef]
  139. Lu, J.; Liu, A.; Dong, F.; Gu, F.; Gama, J.; Zhang, G. Learning under Concept Drift: A Review. IEEE Trans. Knowl. Data Eng. 2018, 31, 2346–2363. [Google Scholar] [CrossRef]
  140. Kabir, M.A.; Keung, J.W.; Bennin, K.E.; Zhang, M. Assessing the Significant Impact of Concept Drift in Software Defect Prediction. In Proceedings of the 43rd Annual Computer Software and Applications Conference (COMPSAC’19), Milwaukee, WI, USA, 15–19 July 2019; Volume 1, pp. 53–58. [Google Scholar] [CrossRef]
  141. Zenisek, J.; Holzinger, F.; Affenzeller, M. Machine Learning-based Concept Drift Detection for Predictive Maintenance. Comput. Ind. Eng. 2019, 137, 106031. [Google Scholar] [CrossRef]
  142. Bermeo-Ayerbe, M.A.; Ocampo-Martinez, C.; Diaz-Rozo, J. Data-driven Energy Prediction Modeling for Both Energy Efficiency and Maintenance in Smart Manufacturing Systems. Energy 2022, 238, 121691. [Google Scholar] [CrossRef]
  143. Gal, Y. Uncertainty in Deep Learning. Ph.D. Dissertation, Cambridge University, Cambridge, UK, 2016. [Google Scholar]
  144. Peng, W.; Ye, Z.S.; Chen, N. Bayesian Deep-Learning-Based Health Prognostics Toward Prognostics Uncertainty. IEEE Trans. Ind. Electron. 2019, 67, 2283–2293. [Google Scholar] [CrossRef]
  145. Benker, M.; Furtner, L.; Semm, T.; Zaeh, M.F. Utilizing uncertainty Information in Remaining Useful Life Estimation via Bayesian Neural Networks and Hamiltonian Monte Carlo. J. Manuf. Syst. 2021, 61, 799–807. [Google Scholar] [CrossRef]
  146. Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6402–6413. [Google Scholar]
  147. Nemani, V.P.; Lu, H.; Thelen, A.; Hu, C.; Zimmerman, A.T. Ensembles of Probabilistic LSTM Predictors and Correctors for Bearing Prognostics using Industrial Standards. Neurocomputing 2022, 491, 575–596. [Google Scholar] [CrossRef]
  148. DeVries, T.; Taylor, G.W. Learning Confidence for Out-of-distribution Detection in Neural Networks. arXiv 2018, arXiv:1802.04865. [Google Scholar]
  149. Hendrycks, D.; Gimpel, K. A Baseline for Detecting Misclassified and Out-of-distribution Examples in Neural Networks. In Proceedings of the 7th International Conference Neural Information Processing Systems (ICLR’18), Montréal, ON, Canada, 3–8 December 2018; OpenReview: San Juan, Puerto Rico, 2018. [Google Scholar]
  150. Lee, K.; Lee, K.; Lee, H.; Shin, J. A Simple Unified Framework for Detecting Out-of-distribution Samples and Adversarial Attacks. In Proceedings of the 32rd Conference on Neural Information Processing Systems (NIPS’18), Montreal, QC, Canada, 3–8 December 2018; pp. 7167–7177. [Google Scholar]
  151. Bendale, A.; Boult, T.E. Towards Open Set Deep Networks. In Proceedings of the 32nd International Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, 27–30 June 2016; pp. 1563–1572. [Google Scholar]
  152. Xu, J.; Kovatsch, M.; Lucia, S. Open Set Recognition for Machinery Fault Diagnosis. In Proceedings of the 19st International Conference on Industrial Informatics (INDIN’21), Palma de Mallorca, Spain, 21–23 July 2021; pp. 1–7. [Google Scholar]
  153. Berthelot, D.; Roelofs, R.; Sohn, K.; Carlini, N.; Kurakin, A. AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation. In Proceedings of the 9th International Conference on Learning Representations (ICLR’21), Online, 3–7 May 2021. [Google Scholar]
  154. Wang, Y.; Gu, Q.; Brown, D. Differentially Private Hypothesis Transfer Learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’18), Dublin, Ireland, 10–14 September 2018; pp. 811–826. [Google Scholar]
  155. Farrall, F.; Mittal, N.; Narra, C.; Tello, J.; Dow, E.M. Data-Sharing Made Easy; Technical Report; Deloitte: London, UK, 2021. [Google Scholar]
  156. Otto, B.; Mohr, N.; Roggendorf, M.; Guggenberger, T. Data Sharing in Industrial Ecosystems; Technical Report; McKinsey: New York, NY, USA, 2020. [Google Scholar]
  157. Pandey, A.V. Future of AI in Business: Adaptive Machine Learning Models That Evolve and Improve over Time without Expensive Retraining. Available online: (accessed on 24 November 2021).
  158. Cattaneo, G.; Vanara, F.; Massaro, A. Advanced Technologies for Industry—AT WATCH echnology Focus on Cloud Computing; Technical Report; EU Commission: Brussels, Belgium, 2020. [Google Scholar]
  159. Nakazawa, T.; Kulkarni, D.V. Wafer Map Defect Pattern Classification and Image Retrieval using Convolutional Neural Network. IEEE Trans. Semicond. Manuf. 2019, 31, 309–314. [Google Scholar] [CrossRef]
  160. Labbé, Y.; Carpentier, J.; Aubry, M.; Sivic, J. Single-view Robot Pose and Joint Angle Estimation via Render and Compare. arXiv 2021, arXiv:2104.09359. [Google Scholar]
  161. Priore, P.; Ponte, B.; Rosillo, R.; de la Fuente, D. Applying Machine Learning to the Dynamic Selection of Replenishment Policies in Fast-changing Supply Chain Environments. Int. J. Prod. Res. 2018, 57, 3663–3677. [Google Scholar] [CrossRef]
  162. Liu, F.; Zhong, D. GSOS-ELM: An RFID-Based Indoor Localization System Using GSO Method and Semi-Supervised Online Sequential ELM. Sensors 2018, 18, 1995. [Google Scholar] [CrossRef]
  163. Bellini, M.; Pantalos, G.; Kaspar, P.; Knoll, L.; De-Michielis, L. An Active Deep Learning Method for the Detection of Defects in Power Semiconductors. In Proceedings of the 32nd Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC’21), Milpitas, CA, USA, 10–12 May 2021. [Google Scholar]
  164. Wilde, N.; Kulić, D.; Smith, S.L. Bayesian Active Learning for Collaborative Task Specification using Equivalence Regions. IEEE Robot. Autom. Lett. 2019, 4, 1691–1698. [Google Scholar] [CrossRef]
  165. Ma, P.; Zhang, H.; Fan, W.; Wang, C.; Wen, G.; Zhang, X. A Novel Bearing Fault Diagnosis Method Based on 2D Image Representation and Transfer Learning Convolutional Neural Network. Meas. Sci. Technol. 2019, 30, 055402. [Google Scholar] [CrossRef]
  166. Xiao, D.; Huang, Y.; Zhao, L.; Qin, C.; Shi, H.; Liu, C. Domain Adaptive Motor Fault Diagnosis using Deep Transfer Learning. IEEE Access 2019, 7, 80937–80949. [Google Scholar] [CrossRef]
  167. Cai, J.; Zhang, Z.; Cheng, H. Grasping Novel Objects by Semi-supervised Domain Adaptation. In Proceedings of the Conference on Real-Time Computing and Robotics (RCAR’19), Irkutsk, Russia, 4–9 August 2019; pp. 626–631. [Google Scholar]
  168. Nguyen, T.T.; Hatua, A.; Sung, A.H. Cumulative Training and Transfer Learning for Multi-Robots Collision-Free Navigation Problems. In Proceedings of the 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON’19), New York, NY, USA, 10–12 October 2019; pp. 305–311. [Google Scholar]
  169. Oh, Y.; Ransikarbum, K.; Busogi, M.; Kwon, D.; Kim, N. Adaptive SVM-based Real-time Quality Assessment for Primer-sealer Dispensing Process of Sunroof Assembly Line. Reliab. Eng. Syst. Saf. 2019, 184, 202–212. [Google Scholar] [CrossRef]
  170. Mera, C.; Orozco-Alzate, M.; Branch, J. Incremental Learning of Concept Drift in Multiple Instance Learning for Industrial Visual Inspection. Comput. Ind. 2019, 109, 153–164. [Google Scholar] [CrossRef]
  171. Yu, W.; Zhao, C. Broad Convolutional Neural Network-based Industrial Process Fault Diagnosis with Incremental Learning Capability. IEEE Trans. Ind. Electron. 2019, 67, 5081–5091. [Google Scholar] [CrossRef]
  172. Raj, A.; Majumder, A.; Kumar, S. HiFI: A Hierarchical Framework for Incremental Learning using Deep Feature Representation. In Proceedings of the 28th International Conference on Robot and Human Interactive Communication (RO-MAN’19), New Delhi, India, 14–18 October 2019; pp. 1–6. [Google Scholar]
  173. Kong, Q.; Lu, R.; Yin, F.; Cui, S. Privacy-Preserving Continuous Data Collection for Predictive Maintenance in Vehicular Fog-Cloud. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5060–5070. [Google Scholar] [CrossRef]
  174. Chen, Y.; Ping, Y.; Zhang, Z.; Wang, B.; He, S. Privacy-preserving Image Multi-classification Deep Learning Model in Robot System of Industrial IoT. Neural Comput. Appl. 2021, 33, 4677–4694. [Google Scholar] [CrossRef]
  175. Chen, Y.; Wang, B.; Zhang, Z. PDLHR: Privacy-Preserving Deep Learning Model with Homomorphic Re-Encryption in Robot System. IEEE Syst. J. 2021. [Google Scholar] [CrossRef]
  176. Khan, P.W.; Byun, Y.C.; Park, N. IoT-Blockchain Enabled Optimized Provenance System for Food Industry 4.0 Using Advanced Deep Learning. Sensors 2020, 20, 2990. [Google Scholar] [CrossRef] [PubMed]
  177. Zheng, X.; Cai, Z. Privacy-preserved data sharing towards multiple parties in industrial IoTs. IEEE J. Sel. Areas Commun. 2020, 38, 968–979. [Google Scholar] [CrossRef]
  178. Han, X.; Yu, H.; Gu, H. Visual Inspection with Federated Learning. In Proceedings of the 16th International Conference on Image Analysis and Recognition (ICIAR’19), Waterloo, ON, Canada, 27–29 August 2019; pp. 52–64. [Google Scholar]
  179. Kanagavelu, R.; Li, Z.; Samsudin, J.; Yang, Y.; Yang, F.; Goh, R.S.M.; Cheah, M.; Wiwatphonthana, P.; Akkarajitsakul, K.; Wangz, S. Two-Phase Multi-Party Computation Enabled Privacy-Preserving Federated Learning. arXiv 2020, arXiv:2005.11901. [Google Scholar]
  180. Lim, H.K.; Kim, J.B.; Heo, J.S.; Han, Y.H. Federated Reinforcement Learning for Training Control Policies on Multiple IoT Devices. Sensors 2020, 20, 1359. [Google Scholar] [CrossRef]
  181. Yin, F.; Lin, Z.; Xu, Y.; Kong, Q.; Li, D.; Theodoridis, S. FedLoc: Federated Learning Framework for Cooperative Localization and Location Data Processing. arXiv 2020, arXiv:2003.03697. [Google Scholar] [CrossRef]
  182. Diez-Olivan, A.; Ortego, P.; Del Ser, J.; Landa-Torres, I.; Galar, D.; Camacho, D.; Sierra, B. Adaptive Dendritic Cell-deep Learning Approach for Industrial Prognosis under Changing Conditions. IEEE Trans. Ind. Inform. 2021, 17, 7760–7770. [Google Scholar] [CrossRef]
  183. Maisenbacher, M.; Weidlich, M. Handling Concept Drift in Predictive Process Monitoring. SCC 2017, 17, 1–8. [Google Scholar]
  184. Sajedi, S.O.; Liang, X. Uncertainty-assisted Deep Vision Structural Health Monitoring. Comput.-Aided Civ. Infrastruct. Eng. 2020, 36, 126–142. [Google Scholar] [CrossRef]
  185. Li, Y.T.; Kuo, P.; Guo, J.I. Automatic Industry PCB Board DIP Process Defect Detection with Deep Ensemble Method. In Proceedings of the 29th International Symposium on Industrial Electronics (ISIE’20), Delft, The Netherlands, 17–19 June 2020; pp. 453–459. [Google Scholar]
  186. Godefroy, G.; Arnal, B.; Bossy, E. Compensating for Visibility Artefacts in Photoacoustic Imaging with A Deep Learning Approach Providing Prediction Uncertainties. Photoacoustics 2020, 21, 100218. [Google Scholar] [CrossRef] [PubMed]
  187. Lu, Y.; Wang, Z.; Xie, R.; Liang, S. Bayesian Optimized Deep Convolutional Network for Electrochemical Drilling Process. J. Manuf. Mater. Process. 2019, 3, 57. [Google Scholar] [CrossRef]
  188. Thakur, S.; van Hoof, H.; Higuera, J.C.G.; Precup, D.; Meger, D. Uncertainty aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks. In Proceedings of the International Conference on Robotics and Automation (ICRA’19), Montreal, QC, Canada, 20–24 May 2019; pp. 768–774. [Google Scholar]
  189. Miller, D.; Dayoub, F.; Milford, M.; Sünderhauf, N. Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection. In Proceedings of the International Conference on Robotics and Automation (ICRA’19), Montreal, QC, Canada, 20–24 May 2019; pp. 2348–2354. [Google Scholar]
  190. Mercier, S.; Uysal, I. Neural network models for Predicting perishable Food Temperatures along the Supply Chain. Biosyst. Eng. 2018, 171, 91–100. [Google Scholar] [CrossRef]
Figure 1. Although often used synonymously, deep learning is only a subfield of AI. The latter also includes other promising techniques for smart manufacturing, such as knowledge graphs.
Figure 1. Although often used synonymously, deep learning is only a subfield of AI. The latter also includes other promising techniques for smart manufacturing, such as knowledge graphs.
Applsci 12 08239 g001
Figure 2. (a) The neuron output y is the weighted sum of the elements in input vector x and bias b followed by a non-linear activation function φ ( · ) . Activation functions are normally Sigmod, Tanh, or ReLU [38]. (b) A toy neural network with three layers, in which neuron outputs pass in the forward direction indicated by the blue arrows. ω n are the weights.
Figure 2. (a) The neuron output y is the weighted sum of the elements in input vector x and bias b followed by a non-linear activation function φ ( · ) . Activation functions are normally Sigmod, Tanh, or ReLU [38]. (b) A toy neural network with three layers, in which neuron outputs pass in the forward direction indicated by the blue arrows. ω n are the weights.
Applsci 12 08239 g002
Figure 3. Graphical illustration of two-dimensional non-convex loss function L ( p ) . Red arrows indicate the directions of parameter updating while searching for the minima.
Figure 3. Graphical illustration of two-dimensional non-convex loss function L ( p ) . Red arrows indicate the directions of parameter updating while searching for the minima.
Applsci 12 08239 g003
Figure 4. Graphical illustration of back-propagation with a three-layer fully connected neural network as an example. x = [ x 1 , x 2 ] are the inputs, and y = [ y 1 , y 2 ] are the outputs. ω = [ ω 1 , ω 2 , ] are the weights, and b = [ b 1 , b 2 , b 3 , b 4 ] are the biases. L ( y ) is the loss function. z = [ z 1 , z 2 , z 3 , z 4 ] are the intermediate neuron inputs, for example z 3 = ω 3 · σ ( z 1 ) + ω 4 · σ ( z 2 ) + b 3 . In addition, σ ( · ) is the activation function. Normally, the gradients are calculated from the last to the first layer. For example, according to the chain rule, L ( y ) ω 3 = L ( y ) z 3 z 3 ω 3 , L ( y ) z 3 = L ( y ) y 1 y 1 z 3 . L ( y ) y 1 is the gradient of the loss function, and it is easy to find that y 1 z 3 = σ ( z 3 ) , z 3 ω 3 = σ ( z 1 ) . Therefore, L ( y ) ω 3 = σ ( z 3 ) · σ ( z 1 ) · L ( y ) y 1 (The path is highlighted in blue).
Figure 4. Graphical illustration of back-propagation with a three-layer fully connected neural network as an example. x = [ x 1 , x 2 ] are the inputs, and y = [ y 1 , y 2 ] are the outputs. ω = [ ω 1 , ω 2 , ] are the weights, and b = [ b 1 , b 2 , b 3 , b 4 ] are the biases. L ( y ) is the loss function. z = [ z 1 , z 2 , z 3 , z 4 ] are the intermediate neuron inputs, for example z 3 = ω 3 · σ ( z 1 ) + ω 4 · σ ( z 2 ) + b 3 . In addition, σ ( · ) is the activation function. Normally, the gradients are calculated from the last to the first layer. For example, according to the chain rule, L ( y ) ω 3 = L ( y ) z 3 z 3 ω 3 , L ( y ) z 3 = L ( y ) y 1 y 1 z 3 . L ( y ) y 1 is the gradient of the loss function, and it is easy to find that y 1 z 3 = σ ( z 3 ) , z 3 ω 3 = σ ( z 1 ) . Therefore, L ( y ) ω 3 = σ ( z 3 ) · σ ( z 1 ) · L ( y ) y 1 (The path is highlighted in blue).
Applsci 12 08239 g004
Figure 5. (a) A toy CNN example. Only neighboring neurons are connected to the neuron in the next layer (highlighted in red), whereas all neurons in the previous layer are connected to the next layer neuron in fully-connected layers (highlighted in green). (b) A simplified example of an RNN cell. The network inputs are the combination of history outputs (indicated using arrows) and the new input samples (highlighted in red). Parts of the output neurons are network outputs (highlighted in green).
Figure 5. (a) A toy CNN example. Only neighboring neurons are connected to the neuron in the next layer (highlighted in red), whereas all neurons in the previous layer are connected to the next layer neuron in fully-connected layers (highlighted in green). (b) A simplified example of an RNN cell. The network inputs are the combination of history outputs (indicated using arrows) and the new input samples (highlighted in red). Parts of the output neurons are network outputs (highlighted in green).
Applsci 12 08239 g005
Figure 6. Basic topology of an AE. The latent values contain the key information for generating the original inputs.
Figure 6. Basic topology of an AE. The latent values contain the key information for generating the original inputs.
Applsci 12 08239 g006
Figure 7. Functional components of a vanilla GAN.
Figure 7. Functional components of a vanilla GAN.
Applsci 12 08239 g007
Figure 8. Illustration of SSL with a generative model. All samples are mapped to the latent space (colored points indicate labeled samples from different classes, whereas uncolored points are unlabeled samples). The unlabeled samples are classified according to their distances (dashed lines in the graph) to the labeled samples (cf. clustering). The example point should be green, as the closest labeled neighbors are green.
Figure 8. Illustration of SSL with a generative model. All samples are mapped to the latent space (colored points indicate labeled samples from different classes, whereas uncolored points are unlabeled samples). The unlabeled samples are classified according to their distances (dashed lines in the graph) to the labeled samples (cf. clustering). The example point should be green, as the closest labeled neighbors are green.
Applsci 12 08239 g008
Figure 9. Illustration of graph-based SSL. The graph is constructed with the data samples as nodes and their numerical distances as edges, and the samples with correct marks are labeled ones. Sub-graphs (an example marked in red) are the GNN inputs and the outputs are the node labels (screw in the above graph).
Figure 9. Illustration of graph-based SSL. The graph is constructed with the data samples as nodes and their numerical distances as edges, and the samples with correct marks are labeled ones. Sub-graphs (an example marked in red) are the GNN inputs and the outputs are the node labels (screw in the above graph).
Applsci 12 08239 g009
Figure 10. In parameter-based TL, the model is firstly trained using relevant datasets (left) and the layers of corresponding model parameters (in red dashed box) are transferred to a target model (in purple dashed box) followed by fine-tuning.
Figure 10. In parameter-based TL, the model is firstly trained using relevant datasets (left) and the layers of corresponding model parameters (in red dashed box) are transferred to a target model (in purple dashed box) followed by fine-tuning.
Applsci 12 08239 g010
Figure 11. Illustration of homomorphic (a) and functional (b) encryption for deep learning model training/inference on remote servers.
Figure 11. Illustration of homomorphic (a) and functional (b) encryption for deep learning model training/inference on remote servers.
Applsci 12 08239 g011
Table 1. List of review articles we can find in the literature on deep learning in smart manufacturing.
Table 1. List of review articles we can find in the literature on deep learning in smart manufacturing.
ContentsSurvey Articles
Deep learning basics and list of use casesDeep learning in industry 4.0—brief overview [22]
Deep learning basics and list of use casesDeep learning for smart manufacturing: methods and applications [23]
Deep learning basics and list of use casesData analytics and machine learning for smart process manufacturing: recent advances and perspectives in the big data era [24]
Machine learning basics and use case categories in smart manufacturingMachine Learning for industrial applications: A comprehensive literature review [25]
Machine learning basics and use case categories in smart manufacturingMachine learning and data mining in manufacturing [26]
Categorization of machine learning applications in smart manufacturingA survey of the advancing use and development of machine learning in smart manufacturing [27]
Machine learning use cases in machining processSmart machining process using machine learning: a review and perspective on machining industry [28]
Deep learning for predictive maintenanceMachine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0 [29]
Deep learning for predictive maintenanceA survey of predictive maintenance: systems, purposes and approaches [20]
Deep learning for machinery tool monitoringA review on deep learning in machining and tool monitoring: methods, opportunities, and challenges [30]
Deep learning for smart logisticsA review of further directions for artificial intelligence, machine learning, and deep learning in smart logistics [19]
Deep learning for production process optimizationA review of machine learning for the optimization of production processes [31]
Deep learning for additive manufacturingMachine learning in additive manufacturing: state-of-the-art and perspectives [32]
Deep learning for defect detectionUsing deep learning to detect defects in manufacturing: a comprehensive survey and current challenges [33]
Deep learning for smart gridMachine learning and deep learning in smart manufacturing: the smart grid paradigm [34]
Edge computing for deep learning in smart manufacturingDeep learning for edge computing applications: a state-of-the-art survey [35]
Software development for deep learning in smart manufacturingLarge-scale machine learning systems in real-world industrial settings: a review of challenges and solutions [21]
IoT for deep learning in smart manufacturingA survey on deep learning empowered IoT applications [36]
Table 2. List of commonly used deep learning models and the related use case examples in smart manufacturing.
Table 2. List of commonly used deep learning models and the related use case examples in smart manufacturing.
Deep Learning ModelsBrief IntroductionExamples
Convolutional Neural Network (CNN)Neural networks containing convolutional kernels. Usually used for 2D data, such as visual inspection.[10,50,51]
Recurrent Neural Network (RNN)Neural networks containing recurrent cells. Usually used for data streams, such as sensory stream data analysis.[44,52,53]
AutoEncoder (AE)AEs are usually used for feature extraction since it can learn essential information for data reconstruction. AEs are trained in an unsupervised fashion.[54,55,56]
Generative Adversarial Neural Network (GAN)GANs can learn the statistical distributions of the training data in an unsupervised way. Therefore, GANs are often used for anomaly detection.[57,58,59]
TransformerTransformers can learn to differently weight an important part of the inputs. Transformers were originally used for data streams.[47,48,60]
Table 3. Privacy-preserving machine learning techniques and their usages. Data: data preparation and storage, model: model training and inference, architecture: deep model architectures.
Table 3. Privacy-preserving machine learning techniques and their usages. Data: data preparation and storage, model: model training and inference, architecture: deep model architectures.
PPML TechniquesApplied ScenariosApplied Objects
Elimination-based ApproachesCloudData
Homomorphic EncryptionCloudData, Model
Functional EncryptionCloudData, Model
Differential PrivacyCloud, EdgeData, Model
Federated LearningEdgeArchitecture
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Xu, J.; Kovatsch, M.; Mattern, D.; Mazza, F.; Harasic, M.; Paschke, A.; Lucia, S. A Review on AI for Smart Manufacturing: Deep Learning Challenges and Solutions. Appl. Sci. 2022, 12, 8239.

AMA Style

Xu J, Kovatsch M, Mattern D, Mazza F, Harasic M, Paschke A, Lucia S. A Review on AI for Smart Manufacturing: Deep Learning Challenges and Solutions. Applied Sciences. 2022; 12(16):8239.

Chicago/Turabian Style

Xu, Jiawen, Matthias Kovatsch, Denny Mattern, Filippo Mazza, Marko Harasic, Adrian Paschke, and Sergio Lucia. 2022. "A Review on AI for Smart Manufacturing: Deep Learning Challenges and Solutions" Applied Sciences 12, no. 16: 8239.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop