Analysis of Airglow Image Classification Based on Feature Map Visualization

Lin, Zhishuang; Wang, Qianyu; Lai, Chang

doi:10.3390/app13063671

Open AccessArticle

Analysis of Airglow Image Classification Based on Feature Map Visualization

by

Zhishuang Lin

¹

,

Qianyu Wang

² and

Chang Lai

^1,3,*

¹

School of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

²

Merton College, University of Oxford, Oxford OX1 4JD, UK

³

State Key Laboratory of Space Weather, National Space Science Center, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(6), 3671; https://doi.org/10.3390/app13063671

Submission received: 25 January 2023 / Revised: 19 February 2023 / Accepted: 10 March 2023 / Published: 13 March 2023

(This article belongs to the Special Issue Deep Learning Technology in Earth Environment)

Download

Browse Figures

Versions Notes

Abstract

:

All-sky airglow imagers (ASAIs) are used in the Meridian Project to observe the airglow in the middle and upper atmosphere to study the atmospheric perturbation. However, the ripples of airglow caused by the perturbation are only visible in the airglow images taken on a clear night. It is a problem to effectively select images suitable for scientific analysis from the enormous amount of airglow images captured under various environments due to the low efficiency and subjectivity of traditional manual classification. We trained a classification model based on convolutional neural network to distinguish between airglow images from clear nights and unclear nights. The data base contains 1688 images selected from the airglow images captured at Xinglong station (40.4° N, 30.5° E). The entire training process was tracked by feature maps which visualized every resulting classification model. The classification models with the clearest feature maps were saved for future use. We cropped the central part of the airglow images to avoid disturbance from the artificial lights at the edge of the vision field according to the feature maps of our first training. The accuracy of the saved model is 99%. The feature maps of five categories also indicate the reliability of the classification model.

Keywords:

airglow images; image classification; convolutional neural network; feature map visualization

1. Introduction

The all-sky airglow imager (ASAI) is a ground-based camera with a fisheye lens [1]. ASAIs are widely utilized to track the perturbation in the upper atmosphere by capturing nocturnal airglow image with high resolution, such as atmosphere gravity waves in the OH (hydroxyl radical) airglow layer (~87 km) [2,3,4] and travelling ionosphere disturbances in the OI (oxygen atom) airglow layer (~250 km) [5,6,7]. Airglow is invisible when there are clouds or light pollution (moonlight, twilight, and artificial light). Hence, it is necessary to sort out the airglow images taken in the starring clear night for further image processing. A total of 15 ASAIs assigned at different sites have been constructed as part of the Meridian Project [8], capturing about 10,000 airglow images every night. Manual categorization of images is not only extremely time-consuming but is also subject to human errors. Thus, reliable automation of airglow image categorization is an urgent problem to be solved.

Machine learning has proved its ability in image classification [9,10,11,12]. The convolutional neural network (CNN) is one of the most efficient networks to fulfill the task of categorizing clear and unclear nights with minimum time and subjective errors. Convolution and multi-layer structures enable CNNs to maintain a high learning efficiency and recognition accuracy while reducing the computation. The CNN was first conceived of in the 1960s by Hubel and Wiesel [13], who proposed the concept of a receptive field and discovered the hierarchical processing mechanism of information in visual cortical pathways based on experiments on the visual cortex cells of cats. In the 1980s, Fukushima and Miyake proposed the neocognitron based on the receptive field, which is the first implementation of CNNs [14]. In 1942, Paul invented the backpropagation (BP) algorithm [15], which gained recognition through the works of Rumelhart et al. [16]. In 1990, LeCun et al. [17] applied the BP algorithms to handwritten digit recognition. In 1998, LeCun et al. [18] proposed the “LeNet-5” CNN architecture by summarizing the end-to-end training principle of modular systems, greatly improving the performance of standard handwritten digit recognition. However, the shallow machine learning models encountered problems such as local optimum, overfitting, and vanishing gradient as the number of networks increased [19,20,21]. In 2006, Hinton et al. [22,23] demonstrated the excellent feature learning capacity of the artificial neural network with multiple hidden layers. In 2010, it was posited by Glorot et al. [24] that the vanishing-gradient problem could be mitigated by normalized initialization. Since then, CNNs have proved to be effective in various fields of visual recognition [25,26,27,28].

The visualization algorithm of neural networks opened the ‘black box’ of deep neural networks to analyze and modify trained models [29,30,31]. Hinton et al. [22] were the first to realize that the network in a region of the parameter can be initialized by unsupervised models. Then Erhan et al. [32] used gradient ascent in image space to optimize and find the input image that maximizes the neuron activity of interest, thus visualizing the hidden feature layer of unsupervised deep architectures. To reveal the convolution layer, Zeiler et al. [33] proposed to project the feature activations back to the input pixel space through deconvolutional neural network architecture [34]. Based the calculation of the gradient of the class score, Simonyan et al. [35] initially generated an image maximizing the class score. Thus, the notation of the class of CNN is visualized.

In our previous research [36], we trained a classification model that divides airglow images into eight categories to filter out the pictures of unclear nights. Due to the lack of supervision of the training process, we extracted the model after 30 epochs by experience. In this paper, we trained an image classifier with higher speed and accuracy based on a CNN to select the airglow images captured by ASAI under ideal weather conditions, aiming to improve the atomization of the Meridian Project [37]. The classification model discussed in this paper is the optimized edition of the old ones with improved speed and accuracy. According to the feedback in the application, the number of categories was reduced from eight to five for less computation. With the visualized feature maps of the model generated after every epoch, we tracked the evolution of the classification model and chose the best one for further application. This paper is arranged as follows: theories of CNN and visualization of feature maps were introduced in Section 2. Section 3 presents the data set and training process. Based on visualized feature maps, the training process and overfitting are discussed in Section 4. The conclusions are given in Section 5.

2. Structure of the Convolutional Neuron Network

As a kind of feedforward neural network with convolutional computations and neural structures, the CNN is one of the most prominent algorithms of machine learning for image recognition, and performs well with sufficient training data [38]. According to the complexity of airglow image classification, our CNN was designed as a deep network constructed with ten layers (shown in Figure 1): the input layer, the first convolutional layer, the first max-pooling layer, the first dropout layer, the second convolutional layer, the second max-pooling layer, the second dropout layer, the flatten layer, the third dropout layer, and the full connection layer. Among the ten layers, the convolution layer, the max-pooling layer, the dropout layer, and the fully connected layer have adjustable network parameters for training.

The first layer is the input layer in which every input image is adjusted into a matrix of size 128

\times

128

\times

1. Since the input images shot by ASAI are grey-scale images, the matrices in the input layer are all of depth 1 instead of 3 for RGB colored images.

The second layer is the convolutional layer in which the input matrices are scanned and convoluted by a kernel with a 3

\times

3 matrix and then added to a bias. The nine elements in the 3

\times

3 matrix are the weights of the kernel. The weights and bias are trainable parameters, which will be modified after each turn of the training process. After the input image is padded with zero elements into a matrix with size 130

\times

130

\times

1, the kernel will scan the 130

\times

130

\times

1 matrix at a pace of one pixel, covering a 3

\times

3 receptive field each step and generating a new element on the feature map through convolution and the rectified linear unit (ReLU). The new element created by the l-th kernel can be written as:

f_{l} = R e L U (b_{l} + c),

(1)

where

b_{l}

is the bias of the l-th kernel and

c

is the result of convolution. ReLU is widely used in the construction of deep neural networks as the activation function of the neural unit to reduce the error rate of training [25], taking effect as:

R e L U (x) = \max (0, x) .

(2)

A total of 128

\times

128 steps are required for a kernel to traverse all local receptive fields on the matrix. There are 16 kernels applied to every input image in the convolutional layer, resulting in a 128

\times

128

\times

16 output matrix. The purpose of model training is to find appropriate learning parameters such that the value of the loss function is close to its minimum. In a convolutional neural network architecture, all neurons in a layer share the same weights and biases, which greatly reduce the number of weight parameters. With convolution kernels, the weight parameters are reduced from 128 × 128 to 3 × 3 in the first convolutional layer. In addition to effectively reducing computational costs, scanning by convolution kernels can focus on the features of local image regions while ignoring their relative positions in the image. Since the position of typical features, such as the moon and bright stars, in the airglow images rotate with time, the extraction by convolution is necessary to avoid the overfitting caused by putting too much attention on the positions of typical features.

The third layer is the max-pooling layer. During the max-pooling, a pooling matrix of size 2

\times

2 traverses the feature map with a step length of 2, outputting the maximum element in every 2

\times

2 area on the feature map covered by the pooling matrix. Thereby, after max-pooling, the width and length of the feature map are halved to an output matrix of 64

\times

64

\times

16. The max-pooling layer reduces the number of parameters and decreases the required amount of computation by gradually reducing the sizes of feature maps, which also controls overfitting to some extent. After a feature is detected, the precise location of the feature is not as important as its relationship with other features in terms of relative position; this is the reason why this mechanism is effective. Moreover, by applying the max-pooling, the representation becomes invariant to small local translations.

The fourth layer is the dropout layer. The dropout layer drops the outputs of neurons in the network at random to prevent overfitting. There are 64

\times

64

\times

16 neurons in the dropout layer, all of which are connected, respectively, with the neurons in the previous max-pooling layer. Every neuron in the dropout layer is assigned a random number between 0 and 1, which is regenerated at every iteration during training. The neuron will output the data received from the previous layer if its random number is greater than p (set to be 0.5); otherwise, the neuron will temporarily output 0. In 2012, Hinton et al. [39] first proposed the dropout layer as a solution to the problem of overfitting in large feedforward neural networks when trained on small training sets. By randomly omitting half of the feature detectors in each training case, overfitting can be greatly reduced, and dropout also prevents complex co-adaptations. Dropout achieves lower errors and cross-entropy. It significantly controls overfitting and makes the method robust to network architecture choices. In the same year, Alex et al. [25] used the dropout algorithm, setting the output of each hidden neuron to zero with a probability of 0.5. In this way, the neurons that are shut off do not contribute to the forward propagation and do not participate in the backward propagation. As a result, every time an input is presented, the neural network samples different architectures, but all these architectures share the same weights. This technique reduces complex co-adaptations of neurons, as one neuron cannot rely on the existence of specific other neurons.

The next three layers are the second convolutional layer, second max-pooling layer and second dropout layer, respectively. These three layers operate the same as layers from two to four, except the second convolutional layer is scanned by 128 kernels of size 5

\times

5

\times

16.

The eighth layer is a flatten layer, in which the input, a 32 × 32 × 128 matrix, is flattened into a one-dimensional matrix with 131,072 elements. The output from this layer is then passed through the ninth layer, which is another dropout layer with the same parameter setting as the seventh layer.

After being filtered by randomly dropping out, the processed data (a 1 × 131,072 array) are input to the full connection layer (the tenth layer). There are five neurons in the full connection layer, each corresponding to a specific category: clear night, overcast sky, light band, moon, and twilight. Every neuron in the full connection layer is connected to all neurons in the previous dropout layer. The value

C_{u}

of the u-th neuron in the connection layer is derived by weighted summation of all the elements in the input one-dimensional array:

C_{u} = \sum_{v = 1}^{k} w_{u, v} \cdot n_{v} u = 1,2, \dots 5; k = 131,072,

(3)

the

w_{u, v}

is the weight associating the v-th input element and the u-th neuron in the full connection layer;

n_{v}

is the v-th element in the input array. To improve the expression of classification result of input image m, the output of the full connection layer is normalized by SoftMax function, resulting in possibilities

p {(m)}_{u}

to determine the possible category

u

:

p {(m)}_{u} = \frac{e^{C_{u}}}{\sum_{k = 1}^{5} e^{C_{k}}} .

(4)

The output possibilities of all categories of an input image m add up to 1. The image is classified into the category with the greatest possibility. Therefore, the entire CNN process can be represented as a function

F

:

F (m) = u,

(5)

in which the input image of the CNN is m and the output is the prediction of u-th category of the input

m

.

The loss function is used to measure the discrepancies between the predicted category and the real category. In our experiment, the cross-entropy loss function is applied:

L_{m} = - \log p_{g},

(6)

where

L_{m}

is the loss of prediction about an input image m.

p_{g}

indicates the possibility of the real category of the input image

m

. The cross-entropy reflects the difference between the predicted probability distribution and the true category. A high probability of the true category leads to low loss and indicates the high accuracy of model prediction. Ideally, the predicted probability of the true category should be 1, and the loss is zero.

Besides the loss, accuracy is also used to evaluate the classification model. The accuracy is defined as the proportion of correctly classified samples in the total samples:

A c c u r a c y = \frac{T P + T N}{T o t a l},

(7)

where TP (True Positive) represents the number of images of the true category that are predicted as the true category by the model, TN (True Negative) represents the number of images of other categories that are predicted as other categories. The cross-entropy loss and accuracy are recorded after every training epoch to represent the overall performance of the training classification model.

To visualize the feature maps of classification, we need to invert the scoring standard representing a category into a visual image. The procedure of visualization is to find an L2-regularised image

I

through a back-propagation algorithm, so that the value C_u is high:

\arg \max_{I} C_{u} (I) - λ {‖I‖}_{2}^{2},

(8)

where

λ

is the regularization parameter and

{‖I‖}_{2}^{2}

indicates the square of 2-norm.

Although the purpose of the model is to distinguish the airglow images of clear nights from unclear nights, we still subdivide the images of unclear nights into the other four types according to the environmental conditions, which reduces the complexity of the model in extracting the features of a single category and simplifies the structure of our CNN. A deeper network does not perform better, but worse. The added third convolutional layer introduces more convolutional kernels, which greatly expands the dimension of the data matrix and results in the overflow of GPU memory and interruption. More pooling layers will further reduce the size of the image, but the cost is the elimination of the features in the image. Our practice demonstrates that the current CNN architecture ensures a high accuracy while only requiring an affordable computation cost for personal computers.

3. Experiment

3.1. Data Sets

The airglow images for training were captured by the ASAI at Xinglong station, from 1 January 2012 to 31 December 2012. A single ASAI system includes a fish-eye lens, a near-infrared bandpass filter sensitive to OH emission (715–930 nm), a CCD detector, and an optical imaging system. The fish-eye lens has a 180° zenith angle of view. The effective observing radius for OH emission is about 400 km, limited by the increasing distortion at the edge. ASAI takes photos every 64 s, accumulating about 600 images per night.

A total of 1688 representative images were manually selected and divided into five categories: clear night, overcast sky, light band, moon, and twilight (shown in Table 1). Among all the labeled images, 1400 images (approximately 83.0%) were used as the training set and the remaining 288 images (17.0%) were used as the validation set. Images with a representative shape and brightness distribution were selected to enhance the generalization of the classification model. In order to avoid overfitting caused by similar images, at most, one photo is selected out of ten consecutive photos. The number of images assigned to each category depended on the actual number of representative images manually found in each category. For CNNs, the inclusion of a greater number of diverse images in the training process can greatly enhance the robustness of the model’s performance. Horizontal rotation of images is a simple yet effective method of data augmentation. The image matrixes generated from the existing training data by rotation have a modified arrangement of pixels but the same features for classification. To enlarge the training and validation sets, we included three rotated versions of images—anticlockwise rotation by

90 °

,

180 °

, and

270 °

, respectively—into the training and validation set, quadrupling the number of images in both sets.

In the first attempted training, whole images were input to train the image classifier model, but the actual classification accuracy was much lower than the theoretical prediction. The fitted parameters of the full connection layer in the model were visualized to illustrate the feature map (shown in Figure 2). In the feature map, there are several pixels of high weight (bright ones) at the edge of the round version field, suggesting that the model focuses on the artificial lights on the horizon instead of the stars when classifying. To cut out the artificial lights around the image margins and prevent confusion between stars and artificial lights (shown in Figure 3), a square containing 512

\times

512 pixels was cropped at the center of each original image. Since the 512

\times

512 square at the center includes the whole effective observation range, replacing the original image with the cropped region will not lead to information loss.

3.2. Learning Process

In constructing the CNN image classifier, we used Keras, a deep learning API written in Python and running on top of the machine learning platform TensorFlow, to build a CNN. In every epoch of training, a model was generated after 22 iterations. The model was modified continuously until 140 epochs of learning were performed. In one iteration, 64 images were drawn at random from the training set for learning and all 1400 images in the training set in 22 iterations. Such random sampling was employed in the stochastic gradient descent (SGD) method in LeNet. The prediction accuracy and prediction loss in both the training set and validation set during the 140 epochs of learning are shown in Figure 4. The prediction accuracy in the validation set rose rapidly in the 1st to 5th epoch, gradually stabilized and converged to 0.999 in the 5th to 31st epoch, and finally decreased slowly to 0.994 in the 31st to 40th epoch. The prediction loss in the training set quickly descended from the beginning and then gradually stabilized and converged to around 0.007 in the 5th to 40th epoch. The prediction loss in the validation set quickly decreased in the 1st to 5th epoch, gradually stabilized and converged to around 0.003 in the 5th to 31st epoch, and finally rose slowly to 0.014 in 31st to 40th epoch. The quality of model is capricious after 40th epoch. All the minor fluctuations in the curves were caused by the random sampling.

The prediction loss in the validation set rose slowly after the 31st epoch, which might be caused by overfitting. After training, finally the best fitting model result was selected in the 31st epoch based on the trends of prediction loss and feature maps. The image classifier achieved a prediction accuracy of around 100% in training set; the prediction accuracy was around 99% in the validation set; the prediction loss was around 0.002 in the training set, and around 0.003 in the validation set. Despite minor fluctuations caused by random sampling, the convergence of the prediction accuracy and prediction loss in both training and validation sets is acceptable.

The visualized feature maps of five categories of this model are shown in Figure 5, which further verifies that this image classifier has successfully focused on the significant features in every category. The feature maps all roughly display rotational symmetry of 90° rotation about the center of the map since the 90°, 180°, and 270° anticlockwise rotated versions of images were included in both training and validation sets. Typically, in feature map 5 (a), the scattered bright dots correspond to the stars in the clear sky; in feature map 5 (b), the smooth and plain central region indicates the feature of thick cloud covers; in feature map 5 (c), the tiny intersecting lines form a large vertical intersecting structure in the central region, which reflect the feature of the light band; in feature map 5 (d), the darker region of the center and the larger light circles around it reflect the characteristics of the moonlight in the edge region; and in feature map 5 (e), most areas near the edge covered by bright features correspond to overexposure caused by the sun.

4. Discussion

Residual neural networks (ResNets) and transformers are challenging candidates besides CNNs. ResNets address the problem of vanishing gradients in deep neural networks through the introduction of residual connections between layers and can achieve higher accuracy in large image classification tasks [40,41], but at the risk of overfitting. Transformers have the advantage of being highly flexible and capturing long-range dependences in images [42], but their computationally intensive self-attention mechanisms make them unsuitable for this application with constrained resources. CNNs are widely utilized for image classification due to their simple but effective structure and remarkable performance [43]. Another important reason we chose a CNN is that we can select and verify the generated classification model by visualizing the feature maps of the CNN.

To track the training process, we visualized the feature map of category c (light band) after every epoch (shown in Figure 6). After 10 epochs, a cross structure can be seen in most feature maps. The cross corresponds to the horizontal and vertical light bands in the original and rotated images of category c. The parts in the image that coincide with the central cross have a high weight in the evaluation of category c; hence, an input image with a light band will be determined as category c by the classification model. The cross structure becomes clearest near the 31st epoch, which is consistent with the accuracy-loss curves in Figure 4.

We notice that, due to the obvious image features and suitable CNN structures, the model achieved brilliant scores (high accuracy and low loss) in training set and validation set during the training process (shown in Figure 4). However, the accuracy and loss cannot comprehensively and objectively evaluate the performance of the model. The insufficiency and unbalance of data may increase the accuracy to unrealistically high. During the training process, the model load images randomly from the training set and modify its learnable parameters to minimum the loss in different batches. If the sample characteristics of a batch are too similar to those of the previous batch, the accuracy of this batch may be too high. This is because the model may easily achieve high accuracy on these similar samples, but it cannot be well generalized to new data. In addition, the inconsistent number of samples in each category will lead to unrealistically high accuracy because the model tends to judge the ambiguous images to the category with more data. Therefore, to evaluate the effectiveness of the classification model, we visualized the feature maps to exhibit the classification criteria, which can give a direct impression.

After 31 epochs, the cross structure gradually disappeared until the 40th epoch, indicating a trend of overfitting. Overfitting is significant at the 37th epoch, where the loss of the validation set has a peak while the loss of the training set decreases (shown in Figure 4). Accordingly, the feature structure of classification is not obvious in the feature maps of the 37th epoch (shown in Figure 7). Except for category b (‘overcast sky’), other feature maps exhibit similar weight distributions, which will lead to confusion in the actual classification. At the 40th epoch, the significant overfitting was weakened and the feature structures were again visible in the feature maps (shown in Figure 7). However, the central region of the feature map of category e (‘twilight’) exhibits a speckled feature structure similar to that of category a (‘clear night’), which may result in confusion between category e and category a during actual classification.

Neither more training epochs nor a deeper model showed positive effect on the classification. Despite the overfitting in the 37th epoch, the training process lasted for 140 epochs in order to seek better model. The loss curve of validation set does not become more stable, but shows sharp peaks at the 55th and 92nd epochs, indicating serious overfitting. The high loss may be caused by the specialization of the classification standard learned from the randomly picked samples, since the loss of the training set is not high. We visualized feature maps of the 80th epochs (shown in Figure 7), which are located at stable and smooth segments among the intermittent peaks. Compared with the feature maps of the 31st epoch, those forming the 80th epochs do not exhibit more typical characters for each category, which indicates that a model generated from more training is not more effective. Another attempt of adding the third conv-pooling-dropout block failed to improve the model as well. The additional pooling and dropout layers lead to the disappearance of feathers in different categories of images. However, the training of the deeper model with only adding the third convolution layer interrupted with the overflow of memory, because the third convolutional layer introduces more convolutional kernels, which greatly expands the dimension of data matrix.

This model still suffers from the tendency of overfitting caused by insufficiency in the number of training images, consequently failing to achieve a 99% prediction accuracy when classifying airglow images outside of its training and validation sets: the trained model sometimes confuses the ‘moon’ category with the ‘twilight’ category. Fortunately, this misclassification does not affect much, since both confused categories belong to the unclear night. The goal of the image classification is to sort out the images with suitable environmental conditions, which are the images in ‘clear night’ category. We combined the five-categorical classification results and obtained a binary classification: the category of ‘clear night’ and others. The results obtained from the classifier enabled us to accurately collect airglow images captured on a clear night for future research.

5. Conclusions

Based on the visualization method of classification models, we analyzed and optimized the model result generated by CNN training, and obtained a high-accuracy (99%) image classifier to select the airglow images captured by ASAI on a clear night from massive amounts of airglow images. The modified model requires less computation and classified more accurate than the previous model. The training process was tracked by the visualized feature maps saved after every epoch, revealing the criteria of classification, exhibiting the wrong focus of the model misled by the strong artificial light at the edge of the field of version, and illustrating how the classification model is generated. According to the feature maps, we cut out the artificial light and chose the best model before overfitting. Through practice, we found that more training epochs and a deeper model could not improve the classification. The model with the highest accuracy was exported to pick up images obtained on a clear night for further research by categorizing airglow images captured by ASAI. Our attempt provides useful experience for the automatic processing of the observation data from the Meridian Project.

Author Contributions

Z.L.: formal analysis, conceptualization, data curation, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; Q.W.: Formal analysis, Conceptualization, Methodology, Data curation, Validation, Writing—review and editing; C.L.: funding acquisition, project administration, conceptualization, methodology, formal analysis, validation, supervision, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Foundation of Chongqing, grant number cstc2020jcyj-msxmX0914; the Specialized Research Fund for State Key Laboratories. The authors acknowledge the use of data from the Chinese Meridian Project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw airglow data can be obtained from the Chinese Meridian Project (http://data.meridianproject.ac.cn/en).

Conflicts of Interest

The authors declare no conflict of interest.

References

Peterson, A.W.; Kieffaber, L.M. Infrared photography of OH airglow structures. Nature 1973, 242, 321–322. [Google Scholar] [CrossRef]
Li, Q.; Yusupov, K.; Akchurin, A.; Yuan, W.; Liu, X.; Xu, J. First OH airglow observation of mesospheric gravity waves over European Russia region. J. Geophys. Res. 2018, 123, 2168–2180. [Google Scholar] [CrossRef]
Sedlak, R.; Hannawald, P.; Schmide, C.; Wüst, S.; Bittner, M.; Stanič, S. Gravity wave instability structures and turbulence from more than 1.5 years of OH* airglow imager observations in Slovenia. Atmos. Meas. Tech. 2021, 14, 6821–6833. [Google Scholar] [CrossRef]
Ramkumar, T.; Malik, M.; Ganaie, B.; Bhat, A. Airglow-imager based observation of possible influences of subtropical mesospheric gravity waves on F-region ionosphere over Jammu & Kashmir, India. Sci. Rep. 2021, 11, 10168. [Google Scholar] [CrossRef]
Zhou, C.; Tang, Q.; Huang, F.; Liu, Y.; Gu, X.; Lei, J.; Ni, B.; Zhao, Z. The simultaneous observations of nighttime ionospheric E region irregularities and F region medium-scale traveling ionospheric disturbances in midlatitude China. J. Geophys. Res. 2018, 123, 5195–5209. [Google Scholar] [CrossRef]
Figueiredo, C.A.O.B.; Takahashi, H.; Wrasse, C.M.; Otsuka, Y.; Shiokawa, K.; Barros, D. Investigation of nighttime MSTIDs observed by optical thermosphere imagers at low latitudes: Morphology, propagation direction, and wind fltering. J. Geophys. Res.-Space 2018, 123, 7843–7857. [Google Scholar] [CrossRef]
Sau, S.; Lakshmi Narayanan, V.; Gurubaran, S.; Emperumal, K. Study of wave signatures observed in thermospheric airglow imaging over the dip equatorial region. Adv. Space Res. 2018, 62, 1762–1774. [Google Scholar] [CrossRef]
Xu, J.Y.; Li, Q.Z.; Sun, L.C.; Liu, X.; Yuan, W.; Wang, W.B.; Yue, J.; Zhang, S.R.; Liu, W.J.; Jiang, G.Y.; et al. The ground-based airglow imager network in China. In Upper Atmosphere Dynamics and Energetics; Wang, W.B., Zhang, Y.L., Paxton, L.J., Eds.; American Geophysical Union: Washington, DC, USA, 2021; ISBN 978-1-1195-0756-7. [Google Scholar]
Yu, D.; Xu, Q.; Guo, H.; Zhao, C.; Lin, Y.; Li, D. An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification. Sensors 2020, 20, 1999. [Google Scholar] [CrossRef] [Green Version]
Mishra, J.; Goyal, S. An effective automatic traffic sign classification and recognition deep convolutional networks. Multimed. Tools Appl. 2022, 81, 18915–18934. [Google Scholar] [CrossRef]
Lanjewar, M.G.; Gurav, O.L. Convolutional Neural Networks based classifications of soil images. Multimed. Tools Appl. 2022, 81, 10313–10336. [Google Scholar] [CrossRef]
Rocha, M.M.M.; Landini, G.; Florindo, J.B. Medical image classification using a combination of features from convolutional neural networks. Multimed. Tools Appl. 2022. [Google Scholar] [CrossRef]
Hubel, D.H.; Wiesel, T.N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 1962, 160, 106–154. [Google Scholar] [CrossRef] [PubMed]
Fukushima, K.; Miyake, S. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position. Pattern Recogn. 1982, 15, 455–469. [Google Scholar] [CrossRef]
Paul, W. Beyond regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, W.; Jackel, L.D. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 1990, 2, 396–404. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
Girosi, F.; Jones, M.; Poggio, T. Regularization theory and neural networks architectures. Neural Comput. 1995, 7, 219–269. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. J. Mach. Learn. Res. 2010, 9, 249–256. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Int. Conf. Neural Inf. Process. Syst. 2012, 60, 1097–1105. [Google Scholar] [CrossRef] [Green Version]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Int. Conf. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Toshev, A.; Szegedy, C. DeepPose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 1653–1660. [Google Scholar]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inform. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef] [Green Version]
Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef] [Green Version]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2021, 23, 18. [Google Scholar] [CrossRef]
Erhan, D.; Bengio, Y.; Courville, A.; Vincent, P. Visualizing Higher-Layer Features of a Deep Network; University of Montreal: Montreal, QC, Canada, 2009; Volume 1341, p. 1. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Volume 8689, pp. 818–833. [Google Scholar] [CrossRef] [Green Version]
Zeiler, M.D.; Taylor, G.W.; Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2018–2025. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv 2013, arXiv:1312.6034. [Google Scholar] [CrossRef]
Lai, C.; Xu, J.; Yue, J.; Yuan, W.; Liu, X.; Li, W.; Li, Q. Automatic Extraction of Gravity Waves from All-Sky Airglow Image Based on Machine Learning. Remote Sens. 2019, 11, 1516. [Google Scholar] [CrossRef] [Green Version]
Wang, C. Development of the Chinese meridian project. Chin. J. Space Sci. 2010, 30, 382–384. [Google Scholar] [CrossRef]
Egmont-Petersen, M.; de Ridder, D.; Handels, H. Image processing with neural networks—A review. Pattern Recogn. 2002, 35, 2279–2301. [Google Scholar] [CrossRef]
Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar] [CrossRef]
He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv 2016, arXiv:1605.07146. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]

Figure 1. The network structure of the CNN classification model. The number below indicates the size of matrix of each layer.

Figure 2. (a) Original size (1024

\times

1024 pixels) airglow image; (b) three-dimensional view of feature map. The color bar indicates the weight.

Figure 2. (a) Original size (1024

\times

1024 pixels) airglow image; (b) three-dimensional view of feature map. The color bar indicates the weight.

Figure 3. (a–e) Examples of cropped 512

\times

512 images in five categories. (a) Clear night; (b) overcast sky; (c) light band; (d) moon; (e) twilight.

Figure 3. (a–e) Examples of cropped 512

\times

512 images in five categories. (a) Clear night; (b) overcast sky; (c) light band; (d) moon; (e) twilight.

Figure 4. Prediction accuracy and prediction loss in training and validation set in 140 epochs.

Figure 5. Feature maps of five categories: (a) clear night; (b) overcast sky; (c) light band; (d) moon; (e) twilight. The brightness of pixels on feature maps represents weights of classification, with brighter color corresponding to higher weight.

Figure 6. Feature maps for each epoch of the category c (‘light band’) of the classifier. The numbers below images represent the current epoch.

Figure 7. Feature maps of different epochs.

Table 1. Categories of images captured by all-sky airglow imager (ASAI).

Label	Explanation	Images in the Training/ Validation Set
clear night	Stars can be seen clearly; there are no apparent intense light sources other than stars.	457/105
overcast sky	No light can be seen; completely dark.	405/82
light band	There are obvious/unignorable intense light sources other than stars, such as the light band caused by intense moon light.	103/22
moon	Stars cannot be easily discerned due to extensive areas of intense moon light; there are still darker areas in the image.	361/72
twilight	Stars cannot be recognized due to the extremely intense light emitted by sun; completely white.	67/14

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, Z.; Wang, Q.; Lai, C. Analysis of Airglow Image Classification Based on Feature Map Visualization. Appl. Sci. 2023, 13, 3671. https://doi.org/10.3390/app13063671

AMA Style

Lin Z, Wang Q, Lai C. Analysis of Airglow Image Classification Based on Feature Map Visualization. Applied Sciences. 2023; 13(6):3671. https://doi.org/10.3390/app13063671

Chicago/Turabian Style

Lin, Zhishuang, Qianyu Wang, and Chang Lai. 2023. "Analysis of Airglow Image Classification Based on Feature Map Visualization" Applied Sciences 13, no. 6: 3671. https://doi.org/10.3390/app13063671

APA Style

Lin, Z., Wang, Q., & Lai, C. (2023). Analysis of Airglow Image Classification Based on Feature Map Visualization. Applied Sciences, 13(6), 3671. https://doi.org/10.3390/app13063671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Airglow Image Classification Based on Feature Map Visualization

Abstract

1. Introduction

2. Structure of the Convolutional Neuron Network

3. Experiment

3.1. Data Sets

3.2. Learning Process

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI