With the rapidly deployment of sensors and actuators, the amount of quantitative data produced by them have been increasing rapidly. To analyze the data and learn the underlying knowledge poses a challenge. Machine learning methods have been widely used in industrial, medication and scientific data processing. Machine learning techniques provide novel ways of analyzing sensor data, so that the meaning of new data can be precisely interpreted based on the learnings from past sensor data. Emerging technologies have demonstrated the integration of machining learning functions with physical/chemical sensors. Machine learning has now become one of mandatory building blocks in a wireless sensor network [
1]. There have been many investigations on using the CNN to process data of various sensors. CNN has been used for the retrieval of land surface temperatures from microwave sensors [
2]. It has also been utilized in a fall detection system, which consists of many types of sensors such as accelerometers, acoustic sensors, and wearable sensors [
3]. Here is a case where CNN technology can be part of a smart sensing mechanism. The traditional imaging sensors (e.g., photodetectors) convert incident light into electronic signals. With the CNN technique, the photo sensing technology could be improved to become close-loop and adaptive, i.e., the electrical signals converted from light can be read by the CNN technique, and then the CNN calculation results can be used to generate a feedback to the sensor to adjust the region of interest, sensitivity, etc. There are more examples that show the seamless link between reported technology and various sensory technologies. Machine learning tasks can be generally divided into supervised learning, unsupervised learning and reinforcement learning. In supervised learning tasks, the algorithm builds a model from a set of data that contain both the input and the desired output. The tasks in this article are supervised learning task with each data specimen having been assigned categories by experts. Classic machine learning methods include support vector machines, boosting, random forest, k nearest neighbor and artificial neural network (ANN). Support vector machines, with appropriate kernel function, can solve non-linear classification problems. With many-fold cross validation, SVM can tackle classification problems involving multiple classes. Using a Gaussian radial basis function as kernel function, the authors of [
4] modeled the I-V characteristics of gas sensors using support vector regression (SVR) with temperature and gas concentration as criteria. However, a 3-layer feedforward ANN predicted the I-V characteristics of the gas sensor model with much higher accuracy than SVR when examined with experiment data. Using principal component analysis (PCA) as feature selection method, the authors of [
5] employed SVM for classifying multi-sensor data in the prediction of high power laser welding status. Back propagation neural network had been proven to be able to correctly predict the average particle size of TiO
2 nanosized particle samples from their near-infrared diffuse reflectance spectra [
6]. Artificial intelligence paradigms had shown their ability to deal with pattern association, recognition, classification, optimization and prediction tasks in the realm of nanotechnology where many of the systems under study were highly undetermined and several interacting parameters had a strong influence on the results [
7].
Deep learning (DL) has the advantage of being sensitive to imperative minute variations while insensitive to large irrelevant variations of the input over classic machine learning methods that rely on linear classifiers on top of hand-engineered features [
8]. Deep neural network contains thousands of parameters distributed in the hidden layers, which can be seen distorting input in a non-linear way so that categories become linearly separable by the last layer. Deep learning methods have demonstrated their feature extraction ability for training classic machine learning classifiers such as Adaboost or SVM [
9,
10,
11]. A deep learning (DL) method that utilized feedforward structure rendered the highest prediction accuracy against six other machine learning methods when trained on 271 breast cancer samples, each consisting of measured data of 162 metabolites with known chemical structure [
11]. It also successfully learned the top five features that have been proposed as breast cancer biomarkers. The authors of [
12] utilized a deep CNN consisting of six one-dimensional convolutional layers with a filter shape of 1 × 3 interspersed with pooling layers for classifying the simulated data which are one-dimensional time-domain signals. The deep CNN achieved higher classification accuracy than that of SVM, while not being susceptible to bias induced by hand-crafted features. In computational mechanics, the authors of [
13] built a predictive network based on group method of data handling (GMDH) which is a self-organizing deep learning method for time series forecasting problems without big data requirements. The authors used numerical analysis for tracing a part of equilibrium path, the data of which was then used for training the predictive network. The resulting network demonstrated high accuracy while being much less computationally intensive than conventional approach based purely on numerical analysis. Deep learning has shown its potential in biological image and forensic image classification [
9,
14,
15,
16]. The authors of [
16] used a feedforward deep convolutional network with interspersed convolutional layers and pooling layers for binary classification of diabetic retinopathy. They also utilized a data augmentation strategy for increasing the limited training image quantity. The authors of [
9] carried out the research of applying deep CNN to detecting generative adversarial networks (GANs) generated photo-realistic facial images. They interspersed four dropout layers in the CNN to overcome overfitting resulting from increased network depth. By replacing softmax with the Adaboost classifier in the CGFace model, the newly formed ada-CGFace model achieved classification accuracy that compared very favorably against CGFace model alone when detecting a highly imbalanced dataset containing very a small proportion of computer-generated facial images. AlexNet DNN [
10] was utilized as a feature extraction method which was paired with either principle component analysis (PCA)-based or linear discriminant analysis (LDA)-based feature selection for providing training features on which a support-vector-machine based DR classifier was trained. AlexNet DNN-based DR helped classifier achieve accuracy of 97.93% when paired with LDA feature selection, higher than the 95.26% when it was paired with PCA. Using spatial invariant feature transform (SIFT) based feature extraction entailed a classifier of accuracy of 94.4%, confirming the AlexNet DNN based DR feature extraction’s ability.
In the next section, we briefly describe the CNN models that we used. In
Section 3, we describe the experiment detail and results of atmosphere pollutant classification with multi-sensor data. In
Section 4 and
Section 5 we elaborate on the method and results of studying the tweaking hyperparameters and the structure’s influences on CNN performance.