As reported by [
1], 74,703 publications have been published about convolutional neural networks, receiving 2,050,983 citations. The earliest article on CNNs is from the 1990s [
2], where the LenET-5 model was proposed by LeCun. After many years, due to a lack of suitable tools, the possibility of reducing the dimensionality of data using neural networks was demonstrated in 2006 [
3]. The most popular models, such as AlexNet, DeepFace, and DeepID for image classification, including face recognition, were demonstrated in [
4]. In the medical industry, AI is used, for example, to recognise inconsistencies in lung lesions during X-ray analysis [
5]. Image recognition is useful for autonomous cars when recognising signs on the road. In CNNs, activation functions, convolution operations, merging, and training data are significant. The second aspect of classical neural networks is reliability in security systems. The Python programming language can be used to implement image classification. Using high-performance computing (HPC) and parallel processing, CNN models can be accelerated to reduce classification time. Different data pre-processing methods and hardware configurations can also affect the effectiveness of image categorisation. Optimised Python environments for HPC can significantly increase the speed of image classification while maintaining high accuracy. Paper [
6] discusses a performance comparison of spiking CNNs and traditional CNNs in image classification tasks using Python. Currently, CNNs, or deep neural networks, have real-world, industrial application examples in the food and automation industries. In paper [
7], MobileNet was shown to provide better accuracy compared to other models such as DenseNet and traditional CNN models. In an example of bird species classification, the MobileNet model outperformed other models in terms of accuracy. Similarly, in a study of tomato seed variety classification, the MobileNet model achieved the highest classification accuracy [
8]. In a study of rice plant leaf classification, the MobileNet model, with 150 epochs, resulted in the highest accuracy [
9]. These findings highlight the effectiveness of the MobileNet model in various image classification tasks. Another example of CNN applications is the EfficientNetB0 model, which has shown promising results in skin disease diagnosis [
10], haemoglobin level classification for anaemia diagnosis [
11], or white blood cell classification [
12]. In the diagnosis of anaemia, the EfficientNetB0 model achieved a high accuracy of 97.52%, and in the classification of white blood cells, an accuracy of 99.02%. In medical applications, several papers, including [
13], have proposed a CNN-based brain tumour classification model (BCM-CNN) based on CNN hyperparameter optimisation using an adaptive dynamic sinusoidal cosine grey wolf optimiser (ADSCFGWO) algorithm. The training model was built using Inception-ResnetV2. This model uses common pre-trained models (Inception-ResnetV2) to improve brain tumour diagnosis, resulting in binary 0 or 1 (0: normal, 1: tumour). The results of these experiments showed that, as a classifier, BCM-CNN achieved an accuracy of 99.98% in the BRaTS 2021 dataset. More than one learning model can be used for research. For example, paper [
14] proposes a technique to segment organs of the gastrointestinal tract (small intestine, large intestine, and stomach) to help radio oncologists treat cancer patients more quickly and accurately. The proposed model uses the segmentation of small-size images to extract local features more efficiently. It uses six transfer learning models as the backbone of U-Net topology. The six transfer learning models used are Inception V3, SeResNet50, VGG19, DenseNet121, InceptionResNetV2, and EfficientNet B0. The results show that the suggested model outperformed all the other transfer learning models. In the industrial sector, a solution has been proposed by the authors of [
15], where the learning model directed electromagnetic leakage information in AES encryption chips, learning the attack response model of cryptographic FPGA chips. Another proposal for the CNN model is Inception, an example of which is the contactless identification of people based on facial features read from an image, where the validation accuracy level was 99.7% [
16]. Interesting results were reported in paper [
17], where the Inception model was used to accurately decode and recognise EEG motor images. In paper [
18], it was shown that it is possible to recognise knife-type hand-held weapons carried by armed individuals from digital images with an accuracy of 87%. Study [
19] used synthetic and real images applied to CNNs to classify warehouse items. The AI model was based on a combination of DenseNet and Resnet pipelines for colour and depth images and proved outperformance in terms of accuracy and precision rates compared to single CNNs, achieving a 95.23% accuracy.
The study of convolutional networks has been transferred to industrial conditions. For example, a lightweight model, CondenseneTV2, was proposed to identify surface defects during manufacturing and successfully detected faults during a low-frequency operation on edge equipment [
20]. Deep neural networks have been used in social media, in the context of Industry 4.0, to analyse social sentiment for websites to investigate customer satisfaction [
21]. Another example of AI is research of a chemical wastewater treatment plant, which used a large database collected over a period of 20 months, including data such as temperature, machine operation, and water purity. Three algorithms were used to predict wastewater, and these were support vector regression (SVR), long short-term memory (LSTM) neural network, and gated recurrent unit (GRU) neural network. The experimental results showed that the GRU model performed better (MAPE = 10.18%, RMSE = 35.67%, MAE = 31.16%) than LSTM and SVR [
22]. A similar theme is addressed in paper [
23], where raw data, generated by an environmental Internet of Things (EIoT) platform, part of a real case study implemented in Briatico (Italy), were collected and hosted on a server that can process and manage real-time information about the plant. Paper [
24] used CNNs to predict the ‘trajectory’ of object grasping in real time in an industrial application. Siemens introduced the ability to observe and respond to images in real time [
25,
26]. For this task, the manufacturers proposed a specialised artificial intelligence module enabling the inclusion of an artificial intelligence model algorithm based on CNN architecture.
In this paper, the authors investigated a new training base for three different groups of RGB images. Subsequently, they analysed the performance and accuracy of the built model for several epoch ranges. The existing MobileNet model was compared to CNN architecture, and EfficienNetB0 and InceptionV3 were proposed, programmed, and implemented. The models were verified, and characteristics were plotted for a range of different epochs to test the accuracy of the learned model for image verification. This research is the first step to realize image recognition by a real object configured with an S7-1500 family controller compatible with the AI module and Intel RealSeans camera.