# Indigenous Food Recognition Model Based on Various Convolutional Neural Network Architectures for Gastronomic Tourism Business Analytics

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

- An empirical analysis was conducted to investigate the effect of deep-learning techniques on food recognition performance, using transfer-learning approaches as feature extractors on the Sabah Food Dataset and the VIREO-Food172 Dataset.
- A Sabah Food Dataset was created, which contains 11 different categories of popular Sabah foods. It was used to train the machine-learning model for the classification of Sabah foods.
- A preliminary prototype of a web-based application for a food recognition model is presented.

## 2. Related Works

## 3. A Transfer Learning Approach Using Pre-Trained Deep Learning Architecture

#### 3.1. ResNet50

#### 3.2. VGG16

#### 3.3. MobileNet

#### 3.4. Xception

#### 3.5. Inception

#### 3.6. EfficientNet

## 4. Experiments

#### 4.1. Food Dataset Preparation

#### 4.2. Feature Representations and Classifiers

- 1.
- The $\mathrm{Model}$ denotes a convolutional base of existing pre-trained CNN models as a feature extractor.
- 2.
- The $\mathrm{No}.\mathrm{of}.\mathrm{param}$ denotes the total number of model parameters from the input layer to the final convolutional layer.
- 3.
- The $\mathrm{Input}\text{}\mathrm{Shape}\left(x,y,z\right)$ denotes input image data with a three-dimensional shape. The $x$ represents the height of an image; the $y$ represents the image’s width; and the $z$ represents the depth of an image.
- 4.
- The $OutputShape\left(x,y,z\right)$ denotes the output data shape produced from the last convolutional layer. The $x$ represents the height of an image; the $y$ represents the image’s width; and the $z$ represents the depth of an image.
- 5.
- The $Vectorsize$ denotes an output shape that is flattened into a one-dimensional linear vector.

- 1.
- The $\mathrm{Layer}$ denotes the layer name.
- 2.
- The $\mathrm{Type}$ denotes the type of layer.
- 3.
- The Output denotes feature maps generated from the layer.
- 4.
- The number of parameters of a layer is denoted as $\mathrm{No}.\mathrm{of}.\mathrm{param}$.
- 5.
- The $\mathrm{conv}2\mathrm{d}\_1$, $\mathrm{conv}2\mathrm{d}\_2$, and $\mathrm{conv}2\mathrm{d}\_3$ denotes the convolutional layer of 1, 2, 3.
- 6.
- The $\mathrm{max}\_\mathrm{pooling}2\mathrm{d}\_1$ and $\mathrm{max}\_\mathrm{pooling}2\mathrm{d}\_2$ denotes the max-pooling layer of 1 and 2.
- 7.
- The $\mathrm{dropout}\_1$, $\mathrm{dropout}\_2$, $\mathrm{dropout}\_3$, and $\mathrm{dropout}\_4$ denotes the dropout layer of 1, 2, 3, 4.
- 8.
- The $\mathrm{flatten}\_1$ denotes the flatten layer.
- 9.
- The $\mathrm{dense}\_1$, $\mathrm{dense}\_2$, and $\mathrm{dense}\_3$ denotes the dense layer 1, 2, 3.

#### 4.3. Performance Metrics

## 5. Results and Discussions

#### 5.1. Experiments Results

- MASFD = Mean Accuracy of Sabah Food Dataset, and
- MAVFD = Mean Accuracy of VIREO-Food172 Dataset.

#### 5.2. A comparison of Feature Dimension Using CNN as the Classifier

#### 5.3. Food Recognition Model Deployment

## 6. Conclusions

#### Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Fam, K.S.; Syed Annuar, S.N.; Tan, K.L.; Lai, F.H.; Ingko, I.A. Touring destination and intention to consume indigenous food: A case of Kadazan-Dusun food in Sabah. Br. Food J.
**2019**, 122, 1883–1896. [Google Scholar] [CrossRef] - Mnguni, E.; Giampiccoli, A. Proposing a model on the recognition of indigenous food in tourism attraction and beyond. Afr. J. Hosp. Tour. Leis.
**2019**, 8, 1–13. [Google Scholar] - Noor, A.M.; Remeli, M.R.B.; Hanafiah, M.H.M. International tourist acceptance of Sabah’s gastronomy product. Curr. Issues Hosp. Tour. Res. Innov.
**2012**, 57, 377. [Google Scholar] - Danting, Z.; Quoquab, F.; Mahadi, N. Enhancing the Tourism Operation Success in Sabah Malaysia: A Conceptual Framework. Int. J. Eng. Technol.
**2018**, 7, 147–151. [Google Scholar] [CrossRef] - Nasrudin, N.H.; Harun, A.F. A preliminary study on digital image performance to stimulate food taste experience. Bull. Electr. Eng. Inform.
**2020**, 9, 2154–2161. [Google Scholar] [CrossRef] - Kiourt, C.; Pavlidis, G.; Markantonatou, S. Deep Learning Approaches in Food Recognition. In Machine Learning Paradigms; Springer: Cham, Switzerland, 2020; pp. 83–108. [Google Scholar]
- Prasanna, N.; Mouli, D.C.; Sireesha, G.; Priyanka, K.; Radha, D.; Manmadha, B. Classification of Food categories and Ingredients approximation using an FD-Mobilenet and TF-YOLO. Int. J. Adv. Sci. Technol.
**2020**, 29, 3101–3114. [Google Scholar] - Upreti, A.; Malathy, D.C. Food Item Recognition, Calorie Count and Recommendation using Deep Learning. Int. J. Adv. Sci. Technol.
**2020**, 29, 2216–2222. [Google Scholar] - Yang, H.; Kang, S.; Park, C.; Lee, J.; Yu, K.; Min, K. A Hierarchical deep model for food classification from photographs. KSII Trans. Internet Inf. Syst.
**2020**, 14, 1704–1720. [Google Scholar] [CrossRef] - Razali, M.N.; Manshor, N. A Review of Handcrafted Computer Vision and Deep Learning Approaches for Food Recognition. Int. J. Adv. Sci. Technol.
**2020**, 29, 13734–13751. [Google Scholar] - Mohamed, R.; Perumal, T.; Sulaiman, M.; Mustapha, N. Multi-resident activity recognition using label combination approach in smart home environment. In Proceedings of the 2017 IEEE International Symposium on Consumer Electronics (ISCE), Kuala Lumpur, Malaysia, 14–15 November 2017; pp. 69–71. [Google Scholar] [CrossRef]
- Zainudin, M.; Sulaiman, M.; Mustapha, N.; Perumal, T.; Mohamed, R. Two-stage feature selection using ranking self-adaptive differential evolution algorithm for recognition of acceleration activity. Turk. J. Electr. Eng. Comput. Sci.
**2018**, 26, 1378–1389. [Google Scholar] - Moung, E.G.; Dargham, J.A.; Chekima, A.; Omatu, S. Face recognition state-of-the-art, enablers, challenges and solutions: A review. Int. J. Adv. Trends Comput. Sci. Eng.
**2020**, 9, 96–105. [Google Scholar] [CrossRef] - Dargham, J.A.; Chekima, A.; Moung, E.G. Fusing facial features for face recognition. In Distributed Computing and Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2012; pp. 565–572. [Google Scholar]
- Dargham, J.A.; Chekima, A.; Moung, E.; Omatu, S. Data fusion for face recognition. In Distributed Computing and Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2010; pp. 681–688. [Google Scholar]
- Yahya, F.; Fazli, B.; Sallehudin, H.; Jaya, M. Machine Learning in Dam Water Research: An Overview of Applications and Approaches. Int. J. Adv. Trends Comput. Sci. Eng.
**2020**, 9, 1268–1274. [Google Scholar] [CrossRef] - Lu, Y. Food Image Recognition by Using Convolutional Neural Networks (CNNs). arXiv
**2016**, arXiv:1612.00983. [Google Scholar] - Subhi, M.A.; Ali, S.M. A Deep Convolutional Neural Network for Food Detection and Recognition. In Proceedings of the 2018 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Kuching, Sarawak, Malaysia, 3–6 December 2018; pp. 284–287. [Google Scholar]
- Islam, M.T.; Karim Siddique, B.M.N.; Rahman, S.; Jabid, T. Food Image Classification with Convolutional Neural Network. In Proceedings of the 2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Bangkok, Thailand, 21–24 October 2018. [Google Scholar]
- Jeny, A.A.; Junayed, M.S.; Ahmed, T.; Habib, M.T.; Rahman, M.R. FoNet-Local food recognition using deep residual neural networks. In Proceedings of the 2019 International Conference on Information Technology, ICIT 2019, Bhubaneswar, Odisha, India, 20–22 December 2019. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Zahisham, Z.; Lee, C.P.; Lim, K.M. Food Recognition with ResNet-50. In Proceedings of the 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), Kota Kinabalu, Malaysia, 26–27 September 2020; pp. 1–5. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Taşkıran, M.; Kahraman, N. Comparison of CNN Tolerances to Intra Class Variety in Food Recognition. In Proceedings of the 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), Sofia, Bulgaria, 3–5 July 2019; pp. 1–5. [Google Scholar]
- Howard, G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv Preprint
**2017**, arXiv:1704.04861. [Google Scholar] - Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Yao, N.; Ni, F.; Wang, Z.; Luo, J.; Sung, W.-K.; Luo, C.; Li, G. L2MXception: An improved Xception network for classification of peach diseases. Plant. Methods
**2021**, 17, 1–13. [Google Scholar] [CrossRef] [PubMed] - Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deep-er with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Singla, A.; Yuan, L.; Ebrahimi, T. Food/Non-food Image Classification and Food Categorization using Pre-Trained GoogLeNet Model. In Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, Amsterdam, The Netherlands, 16 October 2016. [Google Scholar]
- Tan, M.; Le, Q.V. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv
**2019**, arXiv:1905.11946. [Google Scholar] - Liu, J.; Wang, M.; Bao, L.; Li, X. EfficientNet based recognition of maize diseases by leaf image classification. J. Physics: Conf. Ser.
**2020**, 1693, 012148. [Google Scholar] [CrossRef] - Chen, J.; Ngo, C.-W. Deep-based Ingredient Recognition for Cooking Recipe Retrieval. In Proceedings of the 24th ACM international conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 32–41. [Google Scholar]
- Hatcher, W.G.; Yu, W. A Survey of Deep Learning: Platforms, Applications and Emerging Research Trends. IEEE Access
**2018**, 6, 24411–24432. [Google Scholar] [CrossRef] - Dargham, J.A.; Chekima, A.; Moung, E.G.; Omatu, S. The Effect of Training Data Selection on Face Recognition in Surveillance Application. Adv. Intell. Syst. Comput.
**2015**, 373, 227–234. [Google Scholar] [CrossRef]

**Figure 3.**A comparison of the performance of six feature representations and ten classifiers on the Sabah Food Dataset.

**Figure 4.**A comparison of the performance of six feature representations and ten classifiers on the VIREO-Food172 Dataset.

Authors | Dataset | Number of Categories | Techniques | Results |
---|---|---|---|---|

Lu (2016) [17] | Small-scale dataset | 10 | A proposed CNN configuration model with 3 convolution-pooling layers and 1 fully connected layer. | Test set accuracy of 74% |

Subhi and Ali (2018) [18] | Self-collected Malaysian foods dataset | 11 | Modified VGG19-CNN model with 21 convolutional layers and 3 fully connected layers. | Not reported |

Islam et al. (2018) [19] | Food-11 dataset | 11 | (i) A proposed CNN configuration model with 5 convolution layers, 3 max-pooling layers and 1 fully connected layer. (ii) Inception V3 pre-trained model with 2 fully connected layers. | (i) Proposed approach achieved 74.7% accuracy. (ii) Pre-trained Inception V3 achieved 92.86% accuracy. |

Jeny et al. (2019) [20] | Self-collected Bangladesh foods dataset | 6 | FoNet-based Deep Residual Neural Network with 47 layers comprises of pooling layers, activation functions, flattened layers, and dropout and normalization. | Testing set accuracy of 98.16%. |

Category Label | Category Name | Number of Images |
---|---|---|

1 | Bakso | 85 |

2 | Sinalau Bakas | 219 |

3 | Ambuyat | 242 |

4 | Barobbo | 83 |

5 | Buras | 198 |

6 | Martabak Jawa | 92 |

7 | Nasi Kuning | 245 |

8 | Mee Tauhu | 145 |

9 | Hinava | 164 |

10 | Latok | 236 |

11 | Nasi lalap | 217 |

Category Label | Category Name | Number of Images |
---|---|---|

1 | Braised pork | 1023 |

2 | Sautéed spicy pork | 987 |

3 | Crispy sweet and sour pork slices | 991 |

4 | Steamed pork with rice powder | 803 |

5 | Pork with salted vegetable | 997 |

6 | Shredded pork with pepper | 708 |

7 | Yu-Shiang shredded pork | 1010 |

8 | Eggs, black fungus, and sautéed sliced pork | 830 |

9 | Braised spare ribs in brown sauce | 712 |

10 | Fried sweet and sour tenderloin | 954 |

Model | No. of. Param | Input Shape $(\mathit{x},\mathit{y},\mathit{z})$ | $\mathbf{Output}\text{}\mathbf{Shape}\text{}\left(\mathbf{Conv}2\mathbf{D}\right)\text{}(\mathit{x},\mathit{y},\mathit{z})$ | Vector Size (Conv1D) |
---|---|---|---|---|

ResNet50 | 25,636,712 | (224, 224, 3) | (32, 32, 2) | (1, 2048) |

VGG16 | 138,357,544 | (224, 224, 3) | (64, 64, 1) | (1, 4096) |

MobileNet | 3,228,864 | (64, 64, 3) | (128, 128, 2) | (1, 32,768) |

Xception | 22,910,480 | (299, 299, 3) | (32, 32, 2) | (1, 2048) |

Inception V3 | 21,802,784 | (299, 299, 3) | (128, 128, 3) | (1, 49,152) |

EFFNet | 5,330,564 | (224, 224, 3) | (16, 16, 245) | (1, 62,720) |

Layer | Type | Output | $\mathbf{N}\mathbf{o}.\mathbf{o}\mathbf{f}.\mathbf{P}\mathbf{a}\mathbf{r}\mathbf{a}\mathbf{m}$ |
---|---|---|---|

conv2d_1 | (Conv2D) | (None, 64, 64) | 500 |

conv2d_2 | (Conv2D) | (None, 64, 64) | 33,825 |

max_pooling2d_1 | (MaxPooling2 | (None, 32, 32) | 0 |

dropout_1 | (Dropout) | (None, 32, 32) | 0 |

conv2d_3 | (Conv2D) | (None, 32, 32) | 84,500 |

max_pooling2d_2 | (MaxPooling2 | (None, 16, 16) | 0 |

dropout_2 | (Dropout) | (None, 16, 16) | 0 |

flatten_1 | (Flatten) | (None, 32,000) | 0 |

dense_1 | (Dense) | (None, 500) | 16,000,500 |

dropout_3 | (Dropout) | (None, 500) | 0 |

dense_2 | (Dense) | (None, 250) | 125,250 |

dropout_4 | (Dropout) | (None, 250) | 0 |

dense_3 | (Dense) | (None, 12) | 3012 |

Total parameters: | 16,247,587 | ||

Trainable parameters: | 16,247,587 | ||

Non-trainable parameters | 0 |

Parameters | Value | Description |
---|---|---|

C | 1.0 | Regularization parameter. |

kernel | rbf | Specifies the kernel type to be used in the algorithm. |

degree | 3 | Degree of the polynomial kernel function (‘poly’). |

gamma | scale | Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. |

coef0 | 0.0 | Independent term in kernel function. |

decision_function_shape | ovo | Multi-class strategy. |

Parameters | Value | Description |
---|---|---|

C | 1.0 | Regularization parameter. |

kernel | rbf | Specifies the kernel type to be used in the algorithm. |

degree | 3 | Degree of the polynomial kernel function (‘poly’). |

gamma | scale | Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. |

coef0 | 0.0 | Independent term in kernel function. |

decision_function_shape | ovr | Multi-class strategy. |

Parameters | Value | Description |
---|---|---|

penalty | l2 | Specifies the norm used in the penalization. The ‘l1′ leads to coef_ vectors that are sparse. |

loss | square_hinge | Specifies the loss function. ‘hinge’ is the standard SVM loss (used e.g., by the SVC class) while ‘squared_hinge’ is the square of the hinge loss. |

dual | True | Select the algorithm to either solve the dual or primal optimization problem. |

tol | 0.0001 | Tolerance for stopping criteria. |

C | 1.0 | Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. |

multi_class | ovo | Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e., data is expected to be already centered). |

intercept_scaling | 1 | When self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e., a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. |

class_weight | None | Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. |

verbose | 0 | Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context. |

random_state | None | Controls the pseudo-random number generation for shuffling the data for the dual coordinate descent (if dual = True). When dual = False the underlying implementation of LinearSVC is not random and random_state has no effect on the results. |

max_iter | 1000 | The maximum number of iterations to be run. |

Parameters | Value | Description |
---|---|---|

penalty | l2 | Specifies the norm used in the penalization. The ‘l1′ leads to coef_ vectors that are sparse. |

loss | square_hinge | Specifies the loss function. ‘hinge’ is the standard SVM loss (used e.g., by the SVC class) while ‘squared_hinge’ is the square of the hinge loss. |

dual | True | Select the algorithm to either solve the dual or primal optimization problem. |

tol | 1e-4 | Tolerance for stopping criteria. |

C | 1.0 | Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. |

multi_class | ovr | Whether to calculate the intercept for this model. If set to false, no intercept is used in calculations (i.e., data are expected to be already centered). |

intercept_scaling | 1 | When self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e., a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. |

class_weight | None | Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. |

verbose | 0 | Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context. |

random_state | None | Controls the pseudo-random number generation for shuffling the data for the dual coordinate descent (if dual = True). When dual = False, the underlying implementation of LinearSVC is not random and random_state has no effect on the results. |

max_iter | 1000 | The maximum number of iterations to be run. |

Parameters | Value | Description |
---|---|---|

criterion | gini | This function is used to measure the quality of a split. |

splitter | best | The strategy used to choose the split at each node. |

max_depth | None | The maximum depth of the tree |

min_samples_split | 2 | The minimum number of samples required to split an internal node. |

min_samples_leaf | 1 | The minimum number of samples required to be at a leaf node. |

Parameters | Value | Description |
---|---|---|

var_smoothing | 1e-9 | Portion of the largest variance of all features that is added to variances for calculation stability. |

sample_weight | None | Weights applied to individual samples. |

Deep | True | Return the parameters for this estimator and contained sub-objects that are estimators if the value is true. |

Parameters | Value | Description |
---|---|---|

hidden_layer_sizes | (100,) | The ith element represents the number of neurons in the ith hidden layer. |

activation | relu | Activation function for the hidden layer. |

solver | adam | The solver for weight optimization. |

alpha | 0.0001 | L2 penalty (regularization term) parameter. |

batch_size | auto | Size of minibatches for stochastic optimizers. |

learning_rate | constant | Learning rate schedule for weight updates. |

max_iter | 200 | The maximum number of iterations. |

Parameters | Value | Description |
---|---|---|

n_estimators | 100 | The number of trees in the forest. |

criterion | gini | The function to measure the quality of a split. |

max_depth | None | The maximum depth of the tree. |

min_samples_split | 2 | The minimum number of samples required to split an internal node. |

min_samples_leaf | 1 | The minimum number of samples required to be at a leaf node. |

max_features | auto | The number of features to consider when looking for the best split. |

Parameters | Value | Description |
---|---|---|

n_neighbors | 5 | Number of neighbors to use by default for kneighbors queries. |

weights | uniform | Weight function used in prediction. |

algorithm | auto | Algorithm used to compute the nearest neighbors. |

Parameters | Value | Description |
---|---|---|

kernel_size | 32 (3,3) | This parameter determines the dimensions of the kernel. |

strides | (1,1) | This parameter is an integer or tuple/list of 2 integers, specifying the step of the convolution along with the height and width of the input volume. |

padding | valid | The padding parameter of the Keras Conv2D class can take one of two values: ‘valid’ or ‘same’. |

activation | relu | The activation parameter to the Conv2D class, allowing us to supply a string specifying the name of the activation function you want to apply after performing the convolution. |

Machine Learning Approaches | Accuracy |
---|---|

EFFNet + CNN | 0.9401 |

Xception + SVM (OVO) | 0.8632 |

Xception + CNN | 0.8620 |

EFFNet + LSVM (OVA) | 0.8601 |

EFFNet + LSVM (OVO) | 0.8553 |

Xception + LSVM(OVO) | 0.8522 |

InceptionV3 + LSVM (OVA) | 0.8475 |

EFFNet + SVM (OVO) | 0.8459 |

VGG16 + LSVM(OVA) | 0.8082 |

Xception + LSVM (OVA) | 0.8003 |

Machine Learning Approaches | Accuracy |
---|---|

EFFNet + SVM (OVO) | 0.8657 |

EFFNet + LSVM (OVO) | 0.8560 |

EFFNet + LSVM (OVA) | 0.8553 |

EFFNet + SVM (OVA) | 0.8516 |

Xception + SVM (OVO) | 0.8489 |

Xception + LSVM (OVO) | 0.8382 |

Xception + LSVM (OVA) | 0.8304 |

EFFNet + KNN | 0.8035 |

InceptionV3 + SVM (OVO) | 0.8025 |

InceptionV3 + SVM (OVA) | 0.7917 |

Machine Learning Approaches | Accuracy |
---|---|

EFFNet + CNN | 0.9401 |

Xception + SVM (OVO) | 0.8632 |

Inception V3 + LSVM (OVA) | 0.8475 |

VGG16 + LSVM (OVA) | 0.8082 |

MobileNet + CNN | 0.7708 |

Color + CNN | 0.7422 |

ResNet50 + LSVM (OVA) | 0.5236 |

Machine Learning Approaches | Accuracy |
---|---|

EFFNet + SVM (OVO) | 0.8657 |

Xception + SVM (OVO) | 0.8489 |

Inception V3 + SVM (OVO) | 0.8025 |

VGG16 + LSVM (OVO) | 0.7725 |

MobileNet + LSVM (OVO) | 0.6332 |

ResNet50 + LSVM (OVA) | 0.4519 |

Color + CNN | 0.4237 |

Machine Learning Approaches | Accuracy |
---|---|

EFFNet + CNN | 0.9401 |

Xception + SVM (OVO) | 0.8632 |

EFFNet + LSVM (OVA) | 0.8601 |

EFFNet + LSVM (OVO) | 0.8553 |

Inception V3 + KNN | 0.7783 |

Xception + SVM (OVA) | 0.7657 |

EFFNet + Naïve Bayes | 0.7642 |

Inception V3 + Random Forest | 0.6368 |

Inception V3 + Decision Tree | 0.5142 |

VGG16 + ANN | 0.3899 |

Machine Learning Approaches | Accuracy |
---|---|

EFFNet + SVM (OVO) | 0.8657 |

EFFNet + LSVM (OVO) | 0.8560 |

EFFNet + LSVM (OVA) | 0.8553 |

EFFNet + SVM (OVA) | 0.8516 |

EFFNet + KNN | 0.8035 |

EFFNet + Naïve Bayes | 0.7561 |

EFFNet + ANN | 0.7315 |

EFFNet + Random Forest | 0.7201 |

Xception + CNN | 0.7182 |

EFFNet + Decision Tree | 0.5791 |

Feature Representation | Mean Accuracy of Sabah Food Dataset | Mean Accuracy of VIREO-Food172 Dataset | Overall Score |
---|---|---|---|

EFFNet | 0.6311 | 0.7714 | 0.7013 |

Xception | 0.5991 | 0.7017 | 0.6504 |

Inception V3 | 0.6240 | 0.6375 | 0.6308 |

VGG16 | 0.5770 | 0.5896 | 0.5833 |

MobileNet | 0.5053 | 0.3516 | 0.4285 |

ResNet50 | 0.3121 | 0.2977 | 0.3049 |

Color | 0.3626 | 0.2370 | 0.2998 |

Classifier | Mean Accuracy of Sabah Food Dataset | Mean Accuracy of VIREO-Food172 Dataset | Overall Score |
---|---|---|---|

LSVM (OVO) | 0.6941 | 0.6466 | 0.6704 |

LSVM (OVA) | 0.6954 | 0.5976 | 0.6465 |

SVM (OVO) | 0.6049 | 0.6389 | 0.6219 |

CNN | 0.6431 | 0.5555 | 0.5993 |

kNN | 0.5117 | 0.5041 | 0.5079 |

SVM (OVA) | 0.4398 | 0.5656 | 0.5027 |

Naïve Bayes | 0.5133 | 0.4725 | 0.4929 |

Random Forest | 0.5071 | 0.4633 | 0.4852 |

Decision Tree | 0.3933 | 0.3714 | 0.3824 |

ANN | 0.1563 | 0.3082 | 0.2323 |

Work | Dataset | Number of Categories | Model | Accuracy (%) |
---|---|---|---|---|

Our Proposed method | Sabah Food Dataset | 11 | EFFNet + CNN | 94.01 |

Our Proposed method | VIREO-Food172 Dataset | The first ten categories in VIREO-Food172 Dataset | EFFNet + SVM (OVO) | 85.57 |

Jeny et al. (2019) [20] | Self-collected Bangladesh foods dataset | 6 | FoNet | 98.16 |

Islam et al. (2018) [19] | Food-11 dataset | 11 | InceptionV3 + CNN | 92.86 |

Chen and Ngo [32] | VIREO-Food172 Dataset | 20 | MultiTaskCNN | 82.12 (Top-1) |

Lu (2016) [17] | Small-scale dataset | 10 | CNN | 74.00 |

Feature Representation | Feature Dimension | Overall Score |
---|---|---|

EFFNet | 62,720 | 0.7013 |

Xception | 2048 | 0.6504 |

Inception V3 | 49,152 | 0.6308 |

VGG16 | 4096 | 0.5833 |

MobileNet | 32,768 | 0.4285 |

ResNet50 | 2048 | 0.3049 |

Color | 4096 | 0.2998 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Razali, M.N.; Moung, E.G.; Yahya, F.; Hou, C.J.; Hanapi, R.; Mohamed, R.; Hashem, I.A.T.
Indigenous Food Recognition Model Based on Various Convolutional Neural Network Architectures for Gastronomic Tourism Business Analytics. *Information* **2021**, *12*, 322.
https://doi.org/10.3390/info12080322

**AMA Style**

Razali MN, Moung EG, Yahya F, Hou CJ, Hanapi R, Mohamed R, Hashem IAT.
Indigenous Food Recognition Model Based on Various Convolutional Neural Network Architectures for Gastronomic Tourism Business Analytics. *Information*. 2021; 12(8):322.
https://doi.org/10.3390/info12080322

**Chicago/Turabian Style**

Razali, Mohd Norhisham, Ervin Gubin Moung, Farashazillah Yahya, Chong Joon Hou, Rozita Hanapi, Raihani Mohamed, and Ibrahim Abakr Targio Hashem.
2021. "Indigenous Food Recognition Model Based on Various Convolutional Neural Network Architectures for Gastronomic Tourism Business Analytics" *Information* 12, no. 8: 322.
https://doi.org/10.3390/info12080322