A Novel on Conditional Min Pooling and Restructured Convolutional Neural Network

: There is no doubt that CNN has made remarkable technological developments as the core technology of computer vision, but the pooling technique used for CNN has its own issues. This study set out to solve the issues of the pooling technique by proposing conditional min pooling and a restructured convolutional neural network that improved the pooling structure to ensure efﬁcient use of the conditional min pooling. Some Caltech 101 and crawling data were used to test the performance of the conditional min pooling and restructured convolutional neural network. The pooling performance test based on Caltech 101 increased in accuracy by 0.16~0.52% and decreased in loss by 19.98~28.71% compared with the old pooling technique. The restructured convolutional neural network did not have a big improvement in performance compared to the old algorithm, but it provided signiﬁcant outcomes with similar performance results to the algorithm. This paper presents the results that the loss rate was reduced rather than the accuracy rate, and this result was achieved without the improvement of convolution.


Introduction
Before deep learning, computer vision technologies used rule-based approaches that needed to find the characteristics of an object itself for programming. At the ILSVRC 2012 event held by ImageNet, deep learning-based AlexNet [1] showed an overwhelmingly greater performance than the rule-based approaches. Since then, there has been a shift in the direction of computer vision technologies from rule-based to deep-learning-based approaches. Deep learning not only boasts higher performance than old rule-based approaches, but it is also capable of learning for itself by finding important patterns and rules in large data without a need to program the characteristics of an object one by one. Deep learning received further attention as it became possible to do vision testing, which was not used in old rule-based approaches. Computer vision has advanced its performance through deep learning and developed sophisticated technologies such as object detection and segmentation [2,3].
The development of portable electronic devices with a built-in camera and the Internet has spread social media (SNS) and opened easy access to videos, which has created an environment where high quality video data is easily obtainable. This environment has accommodated the characteristics of deep learning and makes use of large volumes of data, promoting the massive development of deep learning. Advanced deep learning has been applied to a variety of industries, including manufacturing, medicine, fashion, and agriculture, and has made various achievements [4][5][6][7][8][9]. There is intensive research on deep learning as a core technology of unmanned systems, such as autonomous driving vehicles, autonomous flight drones, and smart factories [10][11][12].
Computer vision technologies are essential to computers or robots examining the current state based on functions such as human eyes, making a judgment for the next move by recognizing an object or figuring out a situation. Since computer vision is a core technology, it can cause huge accidents and loss of lives from just a small error, which is why computer vision requires a more precise performance [13][14][15][16][17][18].
A convolutional neural network (CNN), a type of deep learning that is mainly investigated in computer vision, can cause such an accident; thus, it has problems [19][20][21][22][23][24][25][26]. Pooling, a CNN structure, plays an important role in reducing the amount of computation and preventing overfitting by reducing the size of a feature map [5,[27][28][29][30][31]. However, it can raise the following issues according to techniques: first, min pooling extracts a minimum value with a feature value on a feature map. When there is 0 within the same space, a feature itself can disappear, and noise can be detected as a characteristic; second, average pooling calculates the means between positive and negative features in the same space, thus offsetting features completely or making them blurry; third, max pooling is used most as it can extract clear features by choosing only the most powerful ones in the same space [32,33]. It can, however, make small or minute features disappear by extracting only the most powerful ones and is prone to overfitting by having the most powerful features [34][35][36][37][38].
The present study proposed conditional min pooling (CMP) and a restructured pooling structure for its efficient utilization to solve the problems of pooling techniques. This paper is organized as follows: Section 2 introduces pooling techniques and pooling structure to move the research forward; Section 3 offers explanations about proposed CMP and restructured pooling structure; Section 4 assesses the proposed technique in performance along with the old approaches; and Section 5 reaches conclusions.

Pooling Method
Min et al. [39] adopts the Window method used by old pooling techniques to extract a feature map and then extract feature values based on probabilities without any particular conditions. It calculates the probabilities of feature values by dividing the entire feature value with feature values within a window and normalizing them. Feature values are randomly extracted according to their probabilities to extract the values of a feature map. Feature values that have more overlapping values are more likely to be extracted, which leads to a greater probability that meaningful values are extracted. Ian et al.'s [40] research employs a random approach to extract characteristics of a window. The test results based on CIFAR-100 and SVHN (Street View House Numbers) show reduced errors and increased accuracy. Zenglin et al. [41] solves such issues as characteristics being offset and reduced in volume when there have negative and positive features in average pooling. In an operational manner, the largest feature value in the Window is started with 1 and ranked in turn for feature values. Based on this rank, it calculates an average of feature values in Ranks 1~4 and extracts it as a final feature value.

CNN Model
A CNN model has received ongoing research efforts as a core technology of computer vision. AlexNet marks the beginning of CNN development. Armed with high performance, AlexNet played a leading role in a shift from the old rule-based approaches to deep learningbased approaches in the development of computer vision. AlexNet was the first model that used a GPU. As it used two GPUs in parallel, it divided one into two parts to process a layer in parallel.
Meanwhile, ResNet [42] solved one of CNN issues, which involved a vanishing gradient despite the improved performance according to deeper layers. It succeeded in training a model with a total of 156 layers. A residual block adds a skip connection structure Electronics 2021, 10, 2407 3 of 18 to add input values to output values. It keeps the old learning information by calculating the residuals of input and output values, enabling additional learning.
DenseNet [43] added a new concept of dense connectivity to ResNet. Dense connectivity connects a layer of the former half to a layer of the latter half, enabling additional learning in the long run. Unlike ResNet, it adds a characteristic as a channel instead of residual-based learning. DenseNet was organized to keep the characteristics of the former half over the long term and process the back propagation of errors more efficiently.

Design of Proposed CMP
In this study, the design of a proposed pooling structure was depicted for the CMP proposed and its efficient utilization. Figure 1 shows the overall block diagram of the proposed pooling structure. Meanwhile, ResNet [42] solved one of CNN issues, which involved a vanishing gradient despite the improved performance according to deeper layers. It succeeded in training a model with a total of 156 layers. A residual block adds a skip connection structure to add input values to output values. It keeps the old learning information by calculating the residuals of input and output values, enabling additional learning.
DenseNet [43] added a new concept of dense connectivity to ResNet. Dense connectivity connects a layer of the former half to a layer of the latter half, enabling additional learning in the long run. Unlike ResNet, it adds a characteristic as a channel instead of residual-based learning. DenseNet was organized to keep the characteristics of the former half over the long term and process the back propagation of errors more efficiently.

Design of Proposed CMP
In this study, the design of a proposed pooling structure was depicted for the CMP proposed and its efficient utilization. Figure 1 shows the overall block diagram of the proposed pooling structure.  The dataset is crawling data and Caltech 101. Crawling data go through the entire process of the image preprocessing module, and Caltech 101 data go through image resize and data augmentation. The data passed through the image preprocessing module is delivered to the CNN input. The CNN that receives the image extracts a feature map using convolution and pooling to proceed with classification. Pooling uses the proposed pooling structure and CMP because it can cause overfitting and offsetting when max pooling and average pooling are used.

Data Pre-Processing Module
For the performance evaluation of a model, the study used Caltech 101 data and image data collected through crawling. There should be a preprocessing process fit for a model to use image data [44][45][46]. The preprocessing of crawling image data happened in the process of Figure 2. The data preprocessing module followed this order: first, images collected through crawling were labeled around the keywords used in searches; second, collected image data were checked for a horizontal length:vertical length ratio, and images of big ratio differences were removed; third, images of the same size were selected to check image redundancy. The images were then converted on Grayscale and compared in structure. Images of the same structure were removed except for one; fourth, an agreement between images and labels was assessed through manual work. Each image was assessed for objects of two keywords or more used in data collection. Images with such objects were eliminated; fifth, images were converted in the same size for the learning and testing of a CNN model; sixth, data were enlarged through the rotation and distortion of images to increase the amounts of image data. The dataset is crawling data and Caltech 101. Crawling data go through the entire process of the image preprocessing module, and Caltech 101 data go through image resize and data augmentation. The data passed through the image preprocessing module is delivered to the CNN input. The CNN that receives the image extracts a feature map using convolution and pooling to proceed with classification. Pooling uses the proposed pooling structure and CMP because it can cause overfitting and offsetting when max pooling and average pooling are used.

Data Pre-Processing Module
For the performance evaluation of a model, the study used Caltech 101 data and image data collected through crawling. There should be a preprocessing process fit for a model to use image data [44][45][46]. The preprocessing of crawling image data happened in the process of Figure 2. The data preprocessing module followed this order: first, images collected through crawling were labeled around the keywords used in searches; second, collected image data were checked for a horizontal length:vertical length ratio, and images of big ratio differences were removed; third, images of the same size were selected to check image redundancy. The images were then converted on Grayscale and compared in structure. Images of the same structure were removed except for one; fourth, an agreement between images and labels was assessed through manual work. Each image was assessed for objects of two keywords or more used in data collection. Images with such objects were eliminated; fifth, images were converted in the same size for the learning and testing of a CNN model; sixth, data were enlarged through the rotation and distortion of images to increase the amounts of image data.

Design of CMP
The CMP was designed based on min pooling for its operation. The old min pooling technique extracts a minimum value with a representative value of a window. It can, thus,

Design of CMP
The CMP was designed based on min pooling for its operation. The old min pooling technique extracts a minimum value with a representative value of a window. It can, thus, remove many characteristics when there is a feature value of 0 nearby. In this study, a CMP was proposed that statistically restricted the process of extracting feature values in min pooling.
CMP extracts a minimum value as a feature value just such as min pooling when there is no 0 in a window. When there is a 0 in a window as a feature value, however, it figures out the number of 0s in a window. The number of 0s is subjected to a constraint according to the percentage of 0 tolerance (0~1) given as a hyperparameter. When there are as many 0s in a window as the tolerance percentage, 0 is extracted as a feature value. When there are not as many 0s as the tolerance, the minimum value except for 0 is extracted as a feature value. CMP works in the same way as min pooling when the tolerance percentage of 0 is 0. In case of 0.5, 0 is extracted as a feature value when 0s account for more than half in a window. In the case of 1, 0 is extracted as a feature value when all the feature values of a window is 0. Figure 3 shows how the CMP works when the tolerance percentage are 0, 0.25, 0.5, 0.75, and 1 in a window of 2 × 2 with Stride 2.
CMP was proposed that statistically restricted the process of extracting feature values in min pooling.
CMP extracts a minimum value as a feature value just such as min pooling when there is no 0 in a window. When there is a 0 in a window as a feature value, however, it figures out the number of 0s in a window. The number of 0s is subjected to a constraint according to the percentage of 0 tolerance (0~1) given as a hyperparameter. When there are as many 0s in a window as the tolerance percentage, 0 is extracted as a feature value. When there are not as many 0s as the tolerance, the minimum value except for 0 is extracted as a feature value. CMP works in the same way as min pooling when the tolerance percentage of 0 is 0. In case of 0.5, 0 is extracted as a feature value when 0s account for more than half in a window. In the case of 1, 0 is extracted as a feature value when all the feature values of a window is 0. Figure 3 shows how the CMP works when the tolerance percentage are 0, 0.25, 0.5, 0.75, and 1 in a window of 2 × 2 with Stride 2.

Design of Neural Network Structure
The proposed pooling structure has srestructured the pooling structure to ensure the more efficient utilization of CMP. Figure 4 shows the proposed pooling structure, which uses a convolution of 1 × 1 to reduce the number of channels by half before a feature map passes through a restructured pooling layer. The restructured pooling layer was organized in two steps to make use of two pooling techniques with max pooling and CMP applied to be combined in a feature map. After three convolution layers, it passes through max pooling and CMP in a restructured pooling structure once again. Two feature maps identified through pooling layers combine the channels in a feature map without reducing their number and send them to a fully connected layer. The detailed structure of the neural network is shown in Table 1.

Design of Neural Network Structure
The proposed pooling structure has srestructured the pooling structure to ensure the more efficient utilization of CMP. Figure 4 shows the proposed pooling structure, which uses a convolution of 1 × 1 to reduce the number of channels by half before a feature map passes through a restructured pooling layer. The restructured pooling layer was organized in two steps to make use of two pooling techniques with max pooling and CMP applied to be combined in a feature map. After three convolution layers, it passes through max pooling and CMP in a restructured pooling structure once again. Two feature maps identified through pooling layers combine the channels in a feature map without reducing their number and send them to a fully connected layer. The detailed structure of the neural network is shown in Table 1.

System Implementation Environment and Performance Evaluation Method
The algorithm proposed in the study was designed, implemented, and assessed in performance in the environment of Table 2. Accuracy rates were used to assess the proposed pooling structure in performance and to compare it with the old models in performance. In performance evaluation, accuracy rates were calculated with the percentage of data that made the right prediction in the entire data. Equation (1) shows the calculation.

Data Set
Caltech 101 data and crawling data were used to assess CMP and a restructured neural network in performance. Caltech 101 is public data provided by the University of California, consisting of 9146 images in total and 101 categories. There are huge differences in the amount of data among the categories from the minimum 31 to maximum 800. For model learning, 12 categories were selected which contained 100 images or more. Of them, seven categories were used for their data after five were excluded for similar or black and white images. These seven categories are airplanes, motorbikes, faces, watches, leopards, Bonsai, and chandeliers. Figure 5 shows some data of Caltech 101. Table 3 shows the current organization of data before the application of data augmentation, which was applied to increase the amounts of data by ten times and use them in learning and testing.  Image data were collected through crawling based on image searches on Google with Beautifulsoup and ChromeDriver of Python. Collected data were put on the image size, redundancy, and error tests with a preprocessing module to build datasets for learning and testing. Figure 6 shows some of the data collected through crawling, and Table 4 shows the current state of crawling data. There was a total of six labels in the collected data, and they include birds, boats, cars, cats, dogs, and rabbits. Their amounts were increased by ten times through data augmentation for learning and testing after preprocessing.  Image data were collected through crawling based on image searches on Google with Beautifulsoup and ChromeDriver of Python. Collected data were put on the image size, redundancy, and error tests with a preprocessing module to build datasets for learning and testing. Figure 6 shows some of the data collected through crawling, and Table 4 shows the current state of crawling data. There was a total of six labels in the collected data, and they include birds, boats, cars, cats, dogs, and rabbits. Their amounts were increased by ten times through data augmentation for learning and testing after preprocessing.

Performance Evaluation of CMP
CMP was assessed in performance with Caltech 101 and crawling data with the CNN models of the same structure and different pooling techniques. Table 5 shows a model structure for performance assessment. It consists of four convolution layers and two pooling layers. Two models that were only comprised of max and average pooling were compared in performance with a model comprised of CMP and max pooling. There are two reasons behind my using both CMP and max pooling: first, the combination of CMP and max pooling recorded better performance than CMP alone even without any special adjustment to tolerance percentage; second, there is a possibility that noises will be extracted when CMP is used in a shallow neural network, which led to the combination of CMP and max pooling rather than CMP alone. The division for training, validation, and testing of each dataset is as follows. CMP was set to allow 0 when more than half of a window is 0 in case of 0.5 tolerance percentage. In this study, I divided data for learning, validation and testing in the following percentage: 74% of the entire data was used for learning models; 16% for validation to check changes in the model education process; and 10% was for testing to check performance for the last time.

Performance Evaluation of CMP
CMP was assessed in performance with Caltech 101 and crawling data with the CNN models of the same structure and different pooling techniques. Table 5 shows a model structure for performance assessment. It consists of four convolution layers and two pooling layers. Two models that were only comprised of max and average pooling were compared in performance with a model comprised of CMP and max pooling. There are two reasons behind my using both CMP and max pooling: first, the combination of CMP and max pooling recorded better performance than CMP alone even without any special adjustment to tolerance percentage; second, there is a possibility that noises will be extracted when CMP is used in a shallow neural network, which led to the combination of CMP and max pooling rather than CMP alone. The division for training, validation, and testing of each dataset is as follows. CMP was set to allow 0 when more than half of a window is 0 in case of 0.5 tolerance percentage. In this study, I divided data for learning, validation and testing in the following percentage: 74% of the entire data was used for learning models; 16% for validation to check changes in the model education process; and 10% was for testing to check performance for the last time.  Figure 7 shows the performance results of pooling with Caltech data. On the graph, the x and y axes represent epoch and accuracy rate, respectively. The blue and orange lines represent the accuracy rates of learning and testing data, respectively.  Figure 7 shows the performance results of pooling with Caltech data. On the graph, the x and y axes represent epoch and accuracy rate, respectively. The blue and orange lines represent the accuracy rates of learning and testing data, respectively.

Data Set
In Figure 7a is a model only comprised of max pooling with a maximum and average accuracy rate of 0.9846% and 0.9802%, respectively. An accuracy rate of 0.98% or lower happened 30 times in total. This model had smaller variations in performance. In the figure, 7b is a model comprised of average pooling with a maximum and average accuracy rate of 0.9824% and 0.9764%, respectively. An accuracy rate of 0.98% or lower happened 81 times in total. This model had bigger variations in performance. In the figure, 7c is a combination of CMP and max pooling with a maximum and average accuracy rate of 0.9877% and 0.9815%, respectively. An accuracy rate of 0.98% or lower happened 21 times in total.   In Figure 7a is a model only comprised of max pooling with a maximum and average accuracy rate of 0.9846% and 0.9802%, respectively. An accuracy rate of 0.98% or lower happened 30 times in total. This model had smaller variations in performance. In the Figure 7b is a model comprised of average pooling with a maximum and average accuracy rate of 0.9824% and 0.9764%, respectively. An accuracy rate of 0.98% or lower happened 81 times in total. This model had bigger variations in performance. In the Figure 7c is a combination of CMP and max pooling with a maximum and average accuracy rate of 0.9877% and 0.9815%, respectively. An accuracy rate of 0.98% or lower happened 21 times in total. Figure 8 shows the final performance results of the three models. Average pooling recorded the lowest performance based on accuracy rates and loss value. Max pooling and CMP showed similar performance results, but there were differences in loss values between max pooling at 0.1021% and CMP at 0.0817%. A structure using CMP recorded higher performance results than max pooling.
Electronics 2021, 10, x FOR PEER REVIEW 10 of 18 Figure 8 shows the final performance results of the three models. Average pooling recorded the lowest performance based on accuracy rates and loss value. Max pooling and CMP showed similar performance results, but there were differences in loss values between max pooling at 0.1021% and CMP at 0.0817%. A structure using CMP recorded higher performance results than max pooling. Meanwhile, Figure 9 shows the pooling performance test results with crawling data: 9a shows the test results of only max pooling, whose maximum, average, and minimum accuracy rates were 0.8451%, 0.7934%, and 0.7772%, respectively; 9b shows the performance results of average pooling, whose maximum, average, and minimum accuracy rates were 0.8423%, 0.7963%, and 0.7707%, respectively; 9c was a combination of CMP and max pooling and recorded maximum, average, and minimum accuracy rates of 0.8433%, 0.8062%, and 0.7811%, respectively.  Meanwhile, Figure 9 shows the pooling performance test results with crawling data: 9a shows the test results of only max pooling, whose maximum, average, and minimum accuracy rates were 0.8451%, 0.7934%, and 0.7772%, respectively; 9b shows the performance results of average pooling, whose maximum, average, and minimum accuracy rates were 0.8423%, 0.7963%, and 0.7707%, respectively; 9c was a combination of CMP and max pooling and recorded maximum, average, and minimum accuracy rates of 0.8433%, 0.8062%, and 0.7811%, respectively. ctronics 2021, 10, x FOR PEER REVIEW 10 of 18 Figure 8 shows the final performance results of the three models. Average pooling recorded the lowest performance based on accuracy rates and loss value. Max pooling and CMP showed similar performance results, but there were differences in loss values between max pooling at 0.1021% and CMP at 0.0817%. A structure using CMP recorded higher performance results than max pooling. Meanwhile, Figure 9 shows the pooling performance test results with crawling data: 9a shows the test results of only max pooling, whose maximum, average, and minimum accuracy rates were 0.8451%, 0.7934%, and 0.7772%, respectively; 9b shows the performance results of average pooling, whose maximum, average, and minimum accuracy rates were 0.8423%, 0.7963%, and 0.7707%, respectively; 9c was a combination of CMP and max pooling and recorded maximum, average, and minimum accuracy rates of 0.8433%, 0.8062%, and 0.7811%, respectively.  (c) CMP pooling Figure 9. Pooling performance comparison using crawling data. Figure 10 shows the final accuracy rates and loss values of crawling data by pooling type. The model that combined CMP and max pooling recorded the highest accuracy rate at 0.81 and the lowest loss rate at 0.23902. In the performance evaluation test, it recorded the highest performance of the three pooling techniques. The average pooling recorded a little bit of a higher performance result than max pooling, unlike Caltech 101.

Performance Evaluation of Restructured Neural Network
A neural network restructured for the efficient utilization of CMP was assessed in performance along with AlexNet, ResNet, and DenseNet. Caltech 101 and crawling data mentioned earlier was used in the performance test.
Additionally, Figure 11 shows the performance test results by the model with Caltech data: (a) shows the performance test results of AlexNet, which frequently had huge performance drops after keeping its overall performance at a certain level. The maximum, minimum, and average accuracy rates of AlexNet were 0.9876%, 0.8050%, and 0.9837%, respectively; (b) shows the performance results of ResNet, whose maximum, minimum, and average accuracy rates were 0.9884%, 0.8708% at the beginning of learning, and 0.9824%, respectively; (c) shows the performance results of DenseNet, which began learning at the highest accuracy rate of 0.9929% but made a huge drop in performance by failing to achieve performance stability. The maximum, minimum, and average accuracy rates of DenseNet were 0.9989%, 0.8633%, and 0.9605%, respectively; (d) shows the performance results of the proposed pooling structure, whose maximum, minimum, and average accuracy rates were 0.9843%, 0.8932, and 0.9773%, respectively. Figure 12 shows the final performance results of a model that used 10% of Caltech data in a performance test. The Figure 9. Pooling performance comparison using crawling data. Figure 10 shows the final accuracy rates and loss values of crawling data by pooling type. The model that combined CMP and max pooling recorded the highest accuracy rate at 0.81 and the lowest loss rate at 0.23902. In the performance evaluation test, it recorded the highest performance of the three pooling techniques. The average pooling recorded a little bit of a higher performance result than max pooling, unlike Caltech 101.
Electronics 2021, 10, x FOR PEER REVIEW 11 of 18 (c) CMP pooling Figure 9. Pooling performance comparison using crawling data. Figure 10 shows the final accuracy rates and loss values of crawling data by pooling type. The model that combined CMP and max pooling recorded the highest accuracy rate at 0.81 and the lowest loss rate at 0.23902. In the performance evaluation test, it recorded the highest performance of the three pooling techniques. The average pooling recorded a little bit of a higher performance result than max pooling, unlike Caltech 101.

Performance Evaluation of Restructured Neural Network
A neural network restructured for the efficient utilization of CMP was assessed in performance along with AlexNet, ResNet, and DenseNet. Caltech 101 and crawling data mentioned earlier was used in the performance test.
Additionally, Figure 11 shows the performance test results by the model with Caltech data: (a) shows the performance test results of AlexNet, which frequently had huge performance drops after keeping its overall performance at a certain level. The maximum, minimum, and average accuracy rates of AlexNet were 0.9876%, 0.8050%, and 0.9837%, respectively; (b) shows the performance results of ResNet, whose maximum, minimum, and average accuracy rates were 0.9884%, 0.8708% at the beginning of learning, and 0.9824%, respectively; (c) shows the performance results of DenseNet, which began learning at the highest accuracy rate of 0.9929% but made a huge drop in performance by failing to achieve performance stability. The maximum, minimum, and average accuracy rates of DenseNet were 0.9989%, 0.8633%, and 0.9605%, respectively; (d) shows the performance results of the proposed pooling structure, whose maximum, minimum, and average accuracy rates were 0.9843%, 0.8932, and 0.9773%, respectively. Figure 12 shows the final performance results of a model that used 10% of Caltech data in a performance test. The

Performance Evaluation of Restructured Neural Network
A neural network restructured for the efficient utilization of CMP was assessed in performance along with AlexNet, ResNet, and DenseNet. Caltech 101 and crawling data mentioned earlier was used in the performance test.
Additionally, Figure 11 shows the performance test results by the model with Caltech data: (a) shows the performance test results of AlexNet, which frequently had huge performance drops after keeping its overall performance at a certain level. The maximum, minimum, and average accuracy rates of AlexNet were 0.9876%, 0.8050%, and 0.9837%, respectively; (b) shows the performance results of ResNet, whose maximum, minimum, and average accuracy rates were 0.9884%, 0.8708% at the beginning of learning, and 0.9824%, respectively; (c) shows the performance results of DenseNet, which began learning at the highest accuracy rate of 0.9929% but made a huge drop in performance by failing to achieve performance stability. The maximum, minimum, and average accuracy rates of DenseNet were 0.9989%, 0.8633%, and 0.9605%, respectively; (d) shows the performance results of the proposed pooling structure, whose maximum, minimum, and average accuracy rates were 0.9843%, 0.8932, and 0.9773%, respectively. Figure 12 shows the final performance results of a model that used 10% of Caltech data in a performance test. The models were similar in performance except for DenseNet, but the proposed pooling structure recorded the highest accuracy rate of 0.9813%. AlexNet recorded the lowest loss rate at 0.0491, being followed by a proposed pooling structure at 0.2407. onics 2021, 10, x FOR PEER REVIEW 12 of 18 models were similar in performance except for DenseNet, but the proposed pooling structure recorded the highest accuracy rate of 0.9813%. AlexNet recorded the lowest loss rate at 0.0491, being followed by a proposed pooling structure at 0.2407.    Figure 13 presents the test results with crawling data. The models had overall similar performance results to the test results with Caltech data: (a) shows the test results of AlexNet, whose maximum, average, and minimum accuracy rates were 0.865%, 0.8575%, and 0.7202%, respectively; (b) shows the test results of ResNet, whose maximum, average, and minimum accuracy rates were 0.848%, 0.8363%, and 0.749%, respectively; (c) shows the test results of DenseNet, which recorded the best performance results with the maximum, average and minimum accuracy rates of 0.9511%, 0.8916%, and 0.7866%, respectively; (d) shows the test results of the proposed pooling structure, whose maximum, average, and minimum accuracy rates were 0.8647%, 0.8414%, and 0.7706%, respectively.  Figure 13 presents the test results with crawling data. The models had overall similar performance results to the test results with Caltech data: (a) shows the test results of AlexNet, whose maximum, average, and minimum accuracy rates were 0.865%, 0.8575%, and 0.7202%, respectively; (b) shows the test results of ResNet, whose maximum, average, and minimum accuracy rates were 0.848%, 0.8363%, and 0.749%, respectively; (c) shows the test results of DenseNet, which recorded the best performance results with the maximum, average and minimum accuracy rates of 0.9511%, 0.8916%, and 0.7866%, respectively; (d) shows the test results of the proposed pooling structure, whose maximum, average, and minimum accuracy rates were 0.8647%, 0.8414%, and 0.7706%, respectively.
Meanwhile, Figure 14 shows the final performance test results of crawling data after learning. DenseNet recorded the highest accuracy rate at 0.8686%, being followed by the proposed pooling structure at 0.8494%. AlexNet recorded the lowest error rate at 0.8454, being followed by the proposed pooling structure at 2.327. Meanwhile, Figure 14 shows the final performance test results of crawling data after learning. DenseNet recorded the highest accuracy rate at 0.8686%, being followed by the proposed pooling structure at 0.8494%. AlexNet recorded the lowest error rate at 0.8454, being followed by the proposed pooling structure at 2.327.   Meanwhile, Figure 14 shows the final performance test results of crawling data after learning. DenseNet recorded the highest accuracy rate at 0.8686%, being followed by the proposed pooling structure at 0.8494%. AlexNet recorded the lowest error rate at 0.8454, being followed by the proposed pooling structure at 2.327. Figure 14. Performance evaluation of neural network models using crawling data (average accuracy and loss value).

Conclusions
In an effort to solve the issues of several pooling techniques usually used in the old CNNs such as overfitting and the extinction of features, in this study developed CMP and a restructured pooling structure to promote its efficient utilization and compared them with old techniques.
The CMP structure was designed based on min pooling and solved the issue of feature extinction by designating a tolerance to the feature of 0. The proposed pooling structure organized old pooling composition in two layers and applied different pooling techniques (max pooling and CMP) to have more diverse feature maps than old pooling approaches. CMP and the proposed pooling structure were tested in two forms.
In the first form of research, CMP was compared in performance with max and average pooling. The test results based on Caltech 101 data show that the CMP technique recorded an accuracy rate of 0.9928%, which was higher than old pooling techniques by Figure 14. Performance evaluation of neural network models using crawling data (average accuracy and loss value).

Conclusions
In an effort to solve the issues of several pooling techniques usually used in the old CNNs such as overfitting and the extinction of features, in this study developed CMP and a restructured pooling structure to promote its efficient utilization and compared them with old techniques.
The CMP structure was designed based on min pooling and solved the issue of feature extinction by designating a tolerance to the feature of 0. The proposed pooling structure organized old pooling composition in two layers and applied different pooling techniques (max pooling and CMP) to have more diverse feature maps than old pooling approaches. CMP and the proposed pooling structure were tested in two forms.
In the first form of research, CMP was compared in performance with max and average pooling. The test results based on Caltech 101 data show that the CMP technique recorded an accuracy rate of 0.9928%, which was higher than old pooling techniques by 0.16~0.52%. Its loss rate was 0.0817, which was lower than old techniques by 19.98~28.71%. In the test with collected images, its accuracy rate was 0.81%, which was higher than old techniques by 1.36~2.56%. Its loss rate was 2.3902, which was lower than old techniques by 9.22~13.28.
In the second form of research, the pooling structure proposed to ensure the efficient utilization of CMP was assessed in performance based on its comparison with AlexNet, ResNet, and DenseNet models. In the final test with the Caltech 101 data, the proposed pooling structure recorded the highest accuracy rate at 0.9813%. AlexNet recorded the lowest error rate at 0.0491, followed by the proposed pooling structure at 0.2393. In the performance test with collected images, DenseNet recorded the highest accuracy rate at 0.8686%, followed by the proposed pooling structure at 0.8494%. AlexNet recorded the lowest error rate at 1.0769, followed by the proposed pooling structure at 2.327.
The first research demonstrated that CMP made an improvement in performance from old pooling techniques even though the improvement was small. The results hold enough significance for the utilization of CMP. The second research assessed the proposed pooling structure in performance and found that it had a relatively outstanding performance even though it was behind the old models. Based on these findings, future studies will transplant various models improved around the old convolution structure [47][48][49][50][51][52][53][54][55][56][57][58][59] in the pooling structure of the proposed model.
We will prepare a follow-up study through exiting study and established algorithms to compare, validate, and test the combined model in performance and state to improve its performance using NeMenyi [60] test and Wilcoxon [61] signed rank test will be conducted based on Demšar et al. [62].