You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

18 August 2024

Using ArcFace Loss Function and Softmax with Temperature Activation Function for Improvement in X-ray Baggage Image Classification Quality

Data Analysis and Machine Learning Department, Financial University under the Government of the Russian Federation, 125167 Moscow, Russia
This article belongs to the Special Issue Advanced Research in Fuzzy Systems and Artificial Intelligence

Abstract

Modern aviation security systems are largely tied to the work of screening operators. Due to physical characteristics, they are prone to problems such as fatigue, loss of attention, etc. There are methods for recognizing such objects, but they face such difficulties as the specific structure of luggage X-ray images. Furthermore, such systems require significant computational resources when increasing the size of models. Overcoming the first and second disadvantage can largely lie in the hardware plane. It needs new introscopes and registration techniques, as well as more powerful computing devices. However, for processing, it is more preferable to improve quality without increasing the computational power requirements of the recognition system. This can be achieved on traditional neural network architectures, but with the more complex training process. A new training approach is proposed in this study. New ways of baggage X-ray image augmentation and advanced approaches to training convolutional neural networks and vision transformer networks are proposed. It is shown that the use of ArcFace loss function for the task of the items binary classification into forbidden and allowed classes provides a gain of about 3–5% for different architectures. At the same time, the use of softmax activation function with temperature allows one to obtain more flexible estimates of the probability of belonging, which, when the threshold is set, allows one to significantly increase the accuracy of recognition of forbidden items, and when it is reduced, provides high recall of recognition. The developed augmentations based on doubly stochastic image models allow one to increase the recall of recognizing dangerous items by 1–2%. On the basis of the developed classifier, the YOLO detector was modified and the mAP gain of 0.72% was obtained. Thus, the research results are matched to the goal of increasing efficiency in X-ray baggage image processing.

1. Introduction

Nowadays, the task of ensuring security in areas of mass gathering of people is becoming more and more urgent. Such areas include subway stations, railway stations, bus stations, airports, concert halls, stadiums, etc. At the same time, when it comes to air transportation, the requirements for permitted baggage are usually stricter than in other places. For example, even scissors or water in medium and big containers are prohibited to carry on board. At the same time, the flow of people in airports is quite large, and the security check can take a long time. It leads in some cases to problems with boarding the flight.
It should be noted that baggage and hand luggage inspection require both hardware to register X-ray images and software to display them on the screen of the inspection operator. Different kinds of introscopes are used for registration [1], and the processed images show a picture different from the optical images which are widely known in traditional image processing tasks. There may be overlaps and gaps of different objects, which lead to additional difficulties to the operator. Moreover, there may be a somewhat nonstandard color gamut of images.
The result of the screening process is either the admission of a passenger’s baggage for boarding with all authorized items, or the seizure of a prohibited item and taking appropriate measures. Automating the process of deciding on the presence or absence of prohibited items in baggage or hand luggage is an important and urgent task. The solution of this task will help one to cope with human error and speed up the screening process. However, such systems will need to meet high quality standards and have extremely high accuracy values. Currently, the most likely use case seems to be the introduction of such systems as decision support tools.
Practice shows that the use of modified activation and loss functions allows us to obtain an improvement in the quality of the classification task in optical images without complicating the architecture. The motivation of this paper is also the need to improve the classification accuracy of prohibited items in X-ray luggage images without complication of neural network architectures. This is due to the need to provide real-time operation and high-accuracy requirements at the same time. The goal is to generally improve the algorithms for classification of prohibited items through new approaches to training neural networks. The objectives of research included the development of a new generative algorithm for image augmentation without the use of deep generative networks, the modification of the training process of neural networks by changing the loss functions, and comparative analysis of the performance quality of the proposed and known solutions. For this purpose, it is possible to use additional augmentation algorithms based on doubly stochastic models and new loss function ArcFace, which has not been previously used in the processing of X-ray images of luggage.

3. Materials and Methods

Let us consider the binary classification problem as the main task, for which most of the modifications will be made.
The available dataset was prepared jointly with the Ulyanovsk Civil Aviation Institute named after Boris Bugaev (Ulyanovsk, Russia). It contains 12,562 images and is not balanced. In particular, the dataset contains 8471 images for allowed baggage items and 4091 images for prohibited baggage items. As processing, the Ulyanovsk Civil Aviation Institute performed registration of different images and then made preprocessing to normalize the images and crop only the objects of interest.
Figure 1 shows examples of images of different classes.
Figure 1. Examples of source images. Class “0” consists of allowed baggage items (not hand luggage). Class “1” consists of prohibited baggage items.
From Figure 1 it is possible to conclude that the baggage items themselves from different classes are very heterogeneous. It makes the training task difficult. Moreover, developing a multi-class recognizer is also problematic due to the strong imbalance of the different items.
Let us use the following augmentation options: no augmentations, traditional Albumentations augmentations [23], and augmentations based on the doubly stochastic image model [41,42]. Let us consider it in more detail.
Let a random field be given as X . For its implementation in the simplest case, a bivariate autoregressive model can be used [41]:
X ( i , j ) = ρ x ( i , j ) X ( i 1 , j ) + ρ y ( i , j ) X ( i , j 1 ) ρ x ( i , j ) ρ y ( i , j ) X ( i 1 , j 1 ) + + σ x [ 1 ρ x 2 ( i , j ) ] [ 1 ρ y 2 ( i , j ) ] ξ ( i , j ) .
Here ρ x is random field of correlation along the row; ρ y is random field of correlation along the column; σ x is standard deviation of the main random field; ξ is random field providing random additive; ( i , j ) is pixel coordinate by row and by column.
To generate random correlation fields a similar model is used [41]:
ρ k ( i , j ) = r k x ρ k ( i 1 , j ) + r k y ρ k ( i , j 1 ) r k x r k y ρ k ( i 1 , j 1 ) + + σ ρ k ( 1 r k x 2 ) ( 1 r k y 2 ) ς k ( i , j ) ,
where r k x and r k y are row and column correlation coefficients for auxiliary fields along the k-th axis, respectively; σ ρ k is standard deviation for the random correlation field along the k-th axis; ς k is random additive field for the k-th axis, k is parameter defining the generated correlation field (either row correlations, then k = x , or column correlations, then k = y ).
Parameter estimation of such a model can be performed as shown in [43]. Figure 2 shows an example of augmentation using the proposed model. On the left is the original image, and on the right is the augmented image. It is possible to see that the images are very close to each other, and it is easy to classify the augmented image for the human eye.
Figure 2. Augmentation based on a doubly stochastic model.
Thus, it is necessary to perform the generation of a new image based on the reference image, which is fed to the input of the augmentation model. Then the parameters of the doubly stochastic model are estimated. It should be noted that the correlation field for row and column, as well as the mean and variance fields, are actually estimated. This provides a different set of internal parameters at each point of the generated pixel. Otherwise, the model works as a classical random field regression model. That is, at high values of correlation across the row, the brightness of a new pixel is generated close to the brightness of the one neighboring pixel, or neighboring pixels group across the row (similarly for the column), and at high variance the model becomes more random in values. The average is needed to provide greater closeness to real images and plays an important role. However, all generated elements are then normalized from 0 to 255 anyway, giving us a new image at the output of the augmenter. At the same time, the speed is much faster than generative neural networks. In other words, it takes a much less time to create the new image.
Also, a training process modification is proposed in this article. Binary cross-entropy is most commonly used in binary classification tasks. In this case, usually the last layer is activated using a sigmoid, and the output itself can be interpreted as the probability of belonging to a positive class. In our case, such a class can be a forbidden object. It is clear that the alternative probability (for the class of allowed item) is calculated by subtracting the predicted probability from 1 (full probability is equal to 1). Then it is possible to write the loss function based on binary cross-entropy as following [38]:
log l o s s = 1 N i = 1 N ( y i × log ( p i ) + ( 1 y i ) × log ( 1 p i ) ) .
Here y i is the true label of positive class membership for the i-th example of training data; p i is the probability value of belonging to a positive class for the i-th example of training data, obtained as a result of the inference of the neural network model.
Thus, the model receives large penalties if it predicts a value that can be interpreted as an incorrect answer at the 0.5 threshold. But the penalties for predicting, for example, 0.3 instead of 0 are rather small, because the final answer will be correct with such a prediction and a threshold of 0.5.
A development of the sigmoid function is the softmax activation function [44], which is needed to interpret the probabilities of belonging to a set of classes. At the same time, it can also be applied to the problem with two classes. The relationship between logits (outputs of not activated last layer), activation function and loss function, is presented in Figure 3.
Figure 3. Using softmax to calculate cross-entropy.
Let us rewrite the expression for sofrmax again [44] as in Figure 3:
f ( y ^ i ) = e y ^ i j = 1 C e y ^ j .
Here y ^ i is value of the i-th logit of the last layer of the neural network before activation; C is number of classes.
It is clear that in (4) the sum of all outputs will be equal to 1. This is valid also for the case C = 2 .
By analogy with expression (3), let us rewrite the cross-entropy loss for the task with multiple classes [44]:
C E = j = 1 C y i log ( f ( y ^ ) i ) .
It should be noted that under the logarithm, it is necessary to use the activation function, because it gives the final predictions on the neural network output. In a standard situation, it is possible to substitute expression (3), and the result for cross-entropy will coincide with the result for sigmoid. Let us use a more advanced transformation, which was proposed in the task of face identification and is called ArcFace [45]:
f ( y ^ i ) = e y ^ i + m j = 1 , j i C e y ^ j + e y ^ i + m ,
where y ^ i is value of the i-th logit of the last layer of the neural network before activation, C is number of classes, and m is margin.
It would be more useful to add a margin in the form of an angle factor, which would move the logit vectors for different classes farther apart [45]:
f ( y ^ i ) = e s cos ( θ y ^ i + m ) j = 1 , j i C e s cos ( θ y ^ j ) + e s cos ( θ y ^ i + m ) .
Here it was suggested to add a vector of model parameters needed for optimization θ . And let us also introduce some standardization coefficient s .
The main parameter is the margin m , which provides the deviation of each class from all others in the extracted feature space. However, too large a value of the parameter m could lead to bad results, so that the classes would be mixed even worse. However, it is better to implement angular deviation, so that this parameter is responsible for the angle and its optimization is simplified.
Thus, the ArcFace function ensures that each class is more removed relative to the others. The ArcFace loss function is designed to improve the discriminative power of the learned feature embeddings in classification tasks, such as face recognition. It achieves this by introducing an additive angular margin penalty between the embeddings of different classes. Typically, in a classification problem, the similarity between an input feature embedding and the reference embeddings of each class is computed, often using cosine similarity, and the class with the highest similarity score is predicted as the output. The ArcFace loss aims to increase the angular distance between the embeddings of different classes by adding a margin to the cosine similarity score of the ground truth class. Specifically, the ArcFace loss function penalizes the model when the cosine similarity between the input embedding and the reference embedding of the ground truth class is not sufficiently larger than the cosine similarities with the reference embeddings of other classes. This angular margin penalty is applied in the log-softmax formulation of the loss function, which normalizes the similarity scores to obtain a valid probability distribution over the classes.
The use of normalized embeddings (i.e., unit-length feature vectors) in the ArcFace loss also helps to simplify the optimization process, as the cosine similarity scores are bounded between −1 and 1. This property allows the model to focus on learning the relative angular relationships between the embeddings, rather than their absolute magnitudes. By minimizing this loss function during training, the model learns to push the embeddings of different classes further apart in the angular space, improving the overall classification performance, particularly in face recognition applications.
Another important aspect is the fact that the training takes place at a partitioning that produces as correct answers probabilities 0 and 1. Due to this, the models are trained such that most of the predictions are distributed around 0 and 1. This makes the fuzzy logic system [46] more predictable. To avoid this problem, it is possible to apply not just softmax function to calculate probabilities, but to use softmax with temperature [46]:
soft max ( x ) i = e y i T i = 1 N e y i T
where T characterizes the temperature. Thus, we can equalize the probability distribution at the output.
The larger the temperature value T , the closer the output layer distribution is to a uniform distribution.
Let us consider the results that the proposed modifications provide.

4. Results and Discussion

Let us compare known convolutional neural network architectures and vision transformer architectures. Table 1 shows the different models and their quality scores on the recall metric for forbidden items. The quantity of augmentations everywhere is 10%.
Table 1. Comparison of recall metrics for prohibited items.
The experiments were performed in the Python programming language using the pytorch framework and the numpy and pandas libraries. ASUS TUF FX504 laptop (CPU Intel Core i7-8750H, 16 GB RAM) with GPU NVIDIA GeForce GTX 1060, 6 GB was used as a computing device. Taking into account the memory size, a batch size of 8 images was chosen.
Transfer learning technology for the architectures already available in pytorch was used. However, in the first case, the cross-entropy function was used as the loss function, and in the second case, the proposed modified loss function implemented using numpy was applied.
Regarding the dataset, validation on a proprietary dataset was performed at the presented earlier special dataset and other known benchmarks.
The analysis of the data presented in Table 1 shows that the proposed methods improve the quality within individual models in general.
At the same time, augmentations have a better effect on networks with convolutional architecture, while ArcFace provides a gain in recall characteristics for all architectures.
The developed classifier was implemented on the additional layer of YOLO instead of ResNet, and experiments were performed. Since the SWIN model showed the best metrics, testing was performed for it, for YOLO with basic ResNet and YOLO without the additional classifier.
The results for the version YOLOv7m (medium) and YOLOv10m (medium) model were compared. The comparative performance is summarized in Table 2 (YOLOv7) and Table 3 (YOLOv10). For the detection task, the original dataset from which the subjects were sliced for classification was used. It amounted to a volume of 2542 images with more than 10,000 subjects. The metric used as the main detection metric was the metric of mean area precision (mAP).
Table 2. Comparison of detection models based on YOLOv7.
Table 3. Comparison of detection models based on YOLOv10.
It is possible to see that the intersection over union (IoU) metric depends only on model type and IoU is higher for YOLOv10. The analysis of the presented results allows us to conclude that the introduction of an additional classifier provides an increase in the quality of detection of the prohibited objects by about 1% compared to the use of ResNet. However, the problem of transformer architectures is their slow speed of operation. And the metrics for YOLOv10 are better. The value of mAP was 0.714 for GroundingDINO model, but this model is very slow.
Finally, let us consider the application of the softmax function with temperature for activation. At T = 0.1 we obtain distant distributions (Figure 4); increasing T = 1.5 brings the probabilities at both ends closer (Figure 5).
Figure 4. Distribution for the small temperature.
Figure 5. Distribution for the middle temperature.
Smoothing distributions also provides a recall gain if a threshold even lower than 0.5 is used for further classification. Studies of a test dataset have shown that applying functions with temperature in the neighborhood of 0.5 allows up to 1.2% gain in the recall of forbidden object detection compared to the traditional softmax function. However, these studies are expected to be more thorough in the future.
Table 4 presents results using different activation functions.
Table 4. Comparison of activation functions.
Table 4 shows that using temperature, it is possible to increase recall of prohibited items classification, but precision deceases a little.
Also, for the proposed learning function, the stability of the models was tested by applying the modified loss function. The analysis was performed for different values of the margin m . As a result, it was found that the learning curves converge, i.e., the method is stable. Figure 6 demonstrates accuracy for different margins.
Figure 6. Training curve families for ResNet using different margins.
To ensure that the proposed solutions are adequate, we chose other datasets and checked the results. We used the Kaggle Suitcase/Luggage Dataset [47] and the HiXray dataset [48]. Table 5 and Table 6 compare results for mAP metric for the first and second dataset, respectively. It should be noted that the training included 20 epochs with default hyperparameters.
Table 5. Detection results on Kaggle Suitcase/Luggage Dataset.
Table 6. Detection results on HiXray dataset.
The key finding from the tables is that the integration of the ArcFace loss function into the YOLOv7 and YOLOv10 object detection models results in improved detection performance compared to the standalone YOLO models. Specifically, on the Kaggle Suitcase/Luggage Dataset, the YOLOv7 + ArcFace model achieves mAP of 0.589, which is higher than the mAP of 0.521 achieved by the standalone YOLOv7 model, and the YOLOv10 + ArcFace model achieves a mAP of 0.596, which is higher than the mAP of 0.563 achieved by the standalone YOLOv10 model. Similarly, on the HiXray dataset, the YOLOv7 + ArcFace model achieves a mAP of 0.714, which is higher than the mAP of 0.687 achieved by the standalone YOLOv7 model, and the YOLOv10 + ArcFace model achieves a mAP of 0.722, which is higher than the mAP of 0.701 achieved by the standalone YOLOv10 model.
However, there are some limitations for this approach. First, the approach can exhibit biases towards certain subgroups within the data, leading to disparities in performance and fairness concerns. The model’s vulnerability to adversarial attacks is also a common issue, where carefully crafted perturbations can cause misclassifications. The dependence on the quality and diversity of the training data is a fundamental challenge, as biases in the data can lead to suboptimal performance in real-world scenarios. Scaling the approach to handle large-scale problems can be computationally expensive, as the computational cost may increase with the size of the reference database or problem complexity. Additionally, the widespread deployment of such systems raises privacy and ethical concerns, which need to be carefully addressed. Finally, improving the generalization of the approach to handle novel or unseen inputs is an important research direction, as it can enhance the robustness and versatility of the system. Addressing these limitations is crucial for developing reliable, fair, and ethically-aligned machine learning and deep learning-based solutions.
So, the proposed methods demand future optimization for choosing the margin parameter value, and it will be discussed in the next publications.

5. Conclusions

Thus, in this article, neural network layers and target functions have been modified in training neural networks for binary classification of luggage X-ray images. The computational cost of the approach for face recognition is primarily dominated by the feature extraction network, whose complexity scales with the network architecture and input image size. The embedding calculation and cosine similarity computation between the input and reference embeddings also contribute to the overall cost, with the latter scaling linearly with the size of the reference database. However, the additional computations for the ArcFace loss function are relatively insignificant compared to the other components. The exact computational requirements will depend on the specific implementation and hardware, but the optimization of the feature extraction network is crucial for achieving efficient face recognition using the ArcFace approach. A method for augmenting such images using doubly stochastic random field models is proposed, showing gains to the quality of the models in the sense of a recall metric. The results obtained are also translated to models of object detection in X-ray baggage and hand luggage images. Studies of the proposed algorithms on other datasets have also shown their robustness and high metrics in recognizing prohibited baggage items. Initial experiments using softmax with temperature have shown its potential in the task of baggage image classification. In particular, this approach helps one to increase recall value. But more in-depth studies on this issue are planned in future works.

Funding

This study received no external funding.

Data Availability Statement

Research data are the property of the author and may be presented by request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Z.; Wang, X.; Shi, Y.; Qi, H.; Jia, M.; Wang, W. Lightweight Detection Method for X-ray Security Inspection with Occlusion. Sensors 2024, 24, 1002. [Google Scholar] [CrossRef] [PubMed]
  2. Kajla, V.; Gupta, A.; Khatak, A. Analysis of X-Ray Images with Image Processing Techniques: A Review. In Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 14–15 December 2018; pp. 1–4. [Google Scholar] [CrossRef]
  3. Riz à Porta, R.; Sterchi, Y.; Schwaninger, A. How Realistic Is. Threat. Image Projection for X-ray Baggage Screening? Sensors 2022, 22, 2220. [Google Scholar] [CrossRef] [PubMed]
  4. Kim, J.-W.; Choi, H.-W.; Kim, S.-K.; Na, W.S. Review of Image-Processing-Based Technology for Structural Health Monitoring of Civil Infrastructures. J. Imaging 2024, 10, 93. [Google Scholar] [CrossRef] [PubMed]
  5. Andriyanov, N.A.; Volkov, A.K.; Volkov, A.K.; Gladkikh, A.A. Research of recognition accuracy of dangerous and safe X-ray baggage images using neural network transfer learning. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1061, 012002. [Google Scholar] [CrossRef]
  6. Mery, D.; Saavedra, D.; Prasad, M. X-Ray Baggage Inspection With Computer Vision: A Survey. IEEE Access 2020, 8, 145620–145633. [Google Scholar] [CrossRef]
  7. Harris, D.H. How to Really Improve Airport Security. Ergon. Des. 2002, 10, 17–22. [Google Scholar] [CrossRef]
  8. Koller, S.M.; Drury, C.G.; Schwaninger, A. Change of search time and non-search time in X-ray baggage screening due to training. Ergonomics 2009, 52, 644–656. [Google Scholar] [CrossRef]
  9. Biggs, A.T.; Mitroff, S.R. Improving the efficacy of security screening tasks: A review of visual search challenges and ways to mitigate their adverse effects. Appl. Cogn. Psychol. 2015, 29, 142–148. [Google Scholar] [CrossRef]
  10. Schwaninger, A. Threat Image Projection: Enhancing performance? Aviat. Secur. Int. 2006, 13, 36–41. [Google Scholar]
  11. Donnelly, N.; Muhl-Richardson, A.; Godwin, H.J.; Cave, K.R. Using eye movements to understand how security screeners search for threats in x-ray baggage. Vision 2019, 3, 24. [Google Scholar] [CrossRef]
  12. Buser, D.; Sterchi, Y.; Schwaninger, A. Why stop after 20 minutes? Breaks and target prevalence in a 60-minute X-ray baggage screening task. Int. J. Ind. Ergon. 2020, 76, 102897. [Google Scholar] [CrossRef]
  13. Godwin, H.J.; Menneer, T.; Cave, K.R.; Donnelly, N. Dual-target search for high and low prevalence X-ray threat targets. Vis. Cogn. 2010, 18, 1439–1463. [Google Scholar] [CrossRef]
  14. Wolfe, J.M.; Horowitz, T.S.; Van Wert, M.J.; Kenner, N.M.; Place, S.S.; Kibbi, N. Low Target Prevalence Is a Stubborn Source of Errors in Visual Search Tasks. J. Exp. Psychol. Gen. 2007, 136, 623–638. [Google Scholar] [CrossRef] [PubMed]
  15. Hofer, F.; Schwaninger, A. Using threat image projection data for assessing individual screener performance. WIT Trans. Built Environ. 2005, 82, 417–426. [Google Scholar]
  16. Skorupski, J.; Uchroński, P. A Human Being as a Part of the Security Control System at the Airport. Procedia Eng. 2016, 134, 291–300. [Google Scholar] [CrossRef]
  17. Meuter, R.F.I.; Lacherez, P.F. When and Why Threats Go Undetected: Impacts of Event Rate and Shift Length on Threat Detection Accuracy during Airport Baggage Screening. Hum. Factors 2016, 58, 218–228. [Google Scholar] [CrossRef]
  18. Hackman, R.; Oldham, G.R. Motivation through the design of work: Test of a theory. Organ. Behav. Hum. Perform. 1976, 16, 250–279. [Google Scholar] [CrossRef]
  19. Humphrey, S.E.; Nahrgang, J.D.; Morgeson, F.P. Integrating Motivational, Social, and Contextual Work Design Features: A Meta-Analytic Summary and Theoretical Extension of the Work Design Literature. J. Appl. Psychol. 2007, 92, 1332–1356. [Google Scholar] [CrossRef] [PubMed]
  20. Roach, G.D.; Lamond, N.; Dawson, D. Feedback has a positive effect on cognitive function during total sleep deprivation if there is sufficient time for it to be effectively processed. Appl. Ergon. 2016, 52, 285–290. [Google Scholar] [CrossRef]
  21. Eckner, J.T.; Chandran, S.K.; Richardson, J.K. Investigating the role of feedback and motivation in clinical reaction time assessment. PM&R 2011, 3, 1092–1097. [Google Scholar]
  22. European Commission. Commission Implementing Regulation (EU) 2015/1998 of 5 November 2015 Laying Down Detailed Measures for the Implementation of the Common Basic Standards on Aviation Security L 299; Publication Office of the European Union: Luxembourg, 2015; pp. 1–142. [Google Scholar]
  23. Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and Flexible Image Augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef]
  24. Andriyanov, N.A.; Dementiev, V.E.; Fu, L. Neural Network Style Transfer of Defects from Concrete to Metal to Improve Monitoring Efficiency. In Proceedings of the 2024 26th International Conference on Digital Signal Processing and its Applications (DSPA), Moscow, Russia, 27–29 March 2024; pp. 1–4. [Google Scholar] [CrossRef]
  25. Kutyrev, A.; Andriyanov, N. Apple Flower Recognition Using Convolutional Neural Networks with Transfer Learning and Data Augmentation Technique. E3S Web Conf. 2024, 493, 01006. [Google Scholar] [CrossRef]
  26. Andriyanov, N. Methods for Preventing Visual Attacks in Convolutional Neural Networks Based on Data Discard and Dimensionality Reduction. Appl. Sci. 2021, 11, 5235. [Google Scholar] [CrossRef]
  27. Andriyanov, N. Deep Learning for Detecting Dangerous Objects in X-rays of Luggage. Eng. Proc. 2023, 33, 20. [Google Scholar] [CrossRef]
  28. Lázaro, P.; Ariel, M. Image recognition for x-ray luggage scanners using free and open source software. In Proceedings of the XXIII Congreso Argentino de Ciencias de la Computación, Buenos Aires, Argentina, 9–13 October 2017; pp. 1–10. [Google Scholar]
  29. Chang, A.; Zhang, Y.; Zhang, S.; Zhong, L.; Zhang, L. Detecting prohibited objects with physical size constraint from cluttered X-ray baggage images. Knowl.-Based Syst. 2022, 237, 107916. [Google Scholar] [CrossRef]
  30. Chavaillaz, A.; Schwaninger, A.; Michel, S.; Sauer, J. Expertise, Automation and Trust in X-Ray Screening of Cabin Baggage. Front. Psychol. 2019, 10, 256. [Google Scholar] [CrossRef]
  31. Iluebe, G.; Katsigiannis, S.; Ramzan, N. IEViT: An enhanced vision transformer architecture for chest X-ray image classification. Comput. Methods Programs Biomed. 2022, 226, 107141. [Google Scholar] [CrossRef]
  32. Manakitsa, N.; Maraslidis, G.S.; Moysis, L.; Fragulis, G.F. A Review of Machine Learning and Deep Learning for Object Detection, Semantic Segmentation, and Human Action Recognition in Machine and Robotic Vision. Technologies 2024, 12, 15. [Google Scholar] [CrossRef]
  33. Wasserthal, J.; Meyer, M.; Breit, H.C.; Cyriac, J.; Yang, S.; Segeroth, M. Totalsegmentator: Robust segmentation of anatomical structures in CT images. arXiv 2022, arXiv:2208.05868. [Google Scholar] [CrossRef]
  34. Paniego, S.; Sharma, V.; Cañas, J.M. Open Source Assessment of Deep Learning Visual Object Detection. Sensors 2022, 22, 4575. [Google Scholar] [CrossRef]
  35. Andriyanov, N.; Papakostas, G. Optimization and Benchmarking of Convolutional Networks with Quantization and OpenVINO in Baggage Image Recognition. In Proceedings of the 2022 VIII International Conference on Information Technology and Nanotechnology (ITNT), Samara, Russia, 23–27 May 2022; pp. 1–4. [Google Scholar] [CrossRef]
  36. Solodskikh, K.; Kurbanov, A.; Aydarkhanov, R.; Zhelavskaya, I.; Parfenov, Y.; Song, D.; Lefkimmiatis, S. Integral Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 16113–16122. [Google Scholar]
  37. Fang, C.; Liu, J.; Han, P.; Chen, M.; Liao, D. FSVM: A Few-Shot Threat Detection Method for X-ray Security Images. Sensors 2023, 23, 4069. [Google Scholar] [CrossRef]
  38. Han, L.; Ma, C.; Liu, Y.; Jia, J.; Sun, J. SC-YOLOv8: A Security Check Model for the Inspection of Prohibited Items in X-ray Images. Electronics 2023, 12, 4208. [Google Scholar] [CrossRef]
  39. Jing, B.; Duan, P.; Chen, L.; Du, Y. EM-YOLO: An X-ray Prohibited-Item-Detection Method Based on Edge and Material Information Fusion. Sensors 2023, 23, 8555. [Google Scholar] [CrossRef]
  40. Jang, H.; Lee, C.; Ko, H.; Lim, K. Data Augmentation of X-ray Images for Automatic Cargo Inspection of Nuclear Items. Sensors 2023, 23, 7537. [Google Scholar] [CrossRef] [PubMed]
  41. Andriyanov, N.A.; Andriyanov, D.A. The using of data augmentation in machine learning in image processing tasks in the face of data scarcity. J. Phys. Conf. Ser. 2020, 1661, 012018. [Google Scholar] [CrossRef]
  42. Andriyanov, N.A.; Vasiliev, K.K.; Dement’ev, V.E. Analysis of the efficiency of satellite image sequences filtering. J. Phys. : Conf. Ser. 2018, 1096, 012036. [Google Scholar] [CrossRef]
  43. Vasiliev, K.K.; Dementyiev, V.E.; Andriyanov, N.A. Using probabilistic statistics to determine the parameters of doubly stochastic models based on autoregression with multiple roots. J. Phys. Conf. Ser. 2019, 1368, 032019. [Google Scholar] [CrossRef]
  44. Bruch, S.; Wang, X.; Bendersky, M.; Najork, M. An Analysis of the Softmax Cross Entropy Loss for Learning-to-Rank with Binary Relevance. In Proceedings of the 2019 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR 2019), Santa Clara, CA, USA, 2–5 October 2019; pp. 75–78. [Google Scholar]
  45. Deng, J.; Guo, J.; Yang, J.; Xue, N.; Kotsia, I.; Zafeiriou, S. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. arXiv 2018, arXiv:1801.07698. [Google Scholar]
  46. Agayan, S.; Bogoutdinov, S.; Kamaev, D.; Dzeboev, B.; Dobrovolsky, M. Trends and Extremes in Time Series Based on Fuzzy Logic. Mathematics 2024, 12, 284. [Google Scholar] [CrossRef]
  47. Kaggle Suitcase/Luggage Dataset. Available online: https://www.kaggle.com/datasets/dataclusterlabs/suitcaseluggage-dataset (accessed on 7 August 2024).
  48. HiXray Dataset. Available online: https://github.com/HiXray-author/HiXray/tree/main (accessed on 7 August 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.