# Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

- (i)
- To implement a robust FER technique utilising the power of Transfer Learning through EfficientNet-XGBoost model.
- (ii)
- Adding fully connected layers to the model for fine tuning for attaining high accuracy.
- (iii)
- Analyze the proposed method’s proficiency by comparing its accuracy in recognising emotions to that of other methods currently in use.

## 2. Related Work

## 3. Materials and Methods

#### 3.1. Deep Learning Using Transfer Learning

- DwC—DepthWise Convolution
- BN—Batch Normalisation
- SE—Squeeze Excitation.
- Swish—an activation.

**Figure 2.**EfficientNet Blocks: (

**a**–

**c**) are the 3 basic building blocks. h, w, and c are input with respect to height, width, and channel for all the MBConv blocks. The Output channel for the two blocks is denoted by C.

#### 3.2. Fully Connected Layer

- $\kappa $ denotes the desired results.
- $\rho $ is the probability of the real-valued representations.
- If $\rho $ = 1 the neuron holding a real value is de-activated else activated.

#### 3.3. XGBoost

- F—Objective Function
- ${g}_{i}$—Mean Square Error first derivative.
- w—Score Vectors on leaves.
- ${h}_{i}$—Mean Square Error second derivative.
- $\lambda $—Penality
- T—Number of leaves.
- $\rho $—Leaf’s Complexity
- ${I}_{j}$—Leaf node j data samples.

## 4. Proposed Work

#### 4.1. Proposed Algorithm

Algorithm 1: MBConv1(K × K, B, S) |

Require:$KKernelSize$, $B:OutputFeatureMaps$, $S:Stride$, $R:ReductionratioofSE$, $T:TotalImages\{{X}_{1},{X}_{2},...{X}_{T}\}$ |

1: $dwc\Leftarrow DepthwiseConv(K\times K,M,S)$ |

2: $bn\Leftarrow BatchNomalization\left(dwc\right)$ |

3: $sw\Leftarrow Swish\left(bn\right)$ |

4: $e\Leftarrow SE($R = 4$,sw)$ |

5: $conv\Leftarrow Conv(1\times 1,B,1,se)$ |

6: $bn\Leftarrow BatchNormalization\left(conv\right)$ |

7: return (h/s × w/s, B = bn) |

Algorithm 2: MBConv6(K × K, B, S) |

Require: Inputs: $K:KernelSize$, $B:OutputFeatureMaps$, $S:Stride$, R: Reduction ratio of SE, $T:TotalImages\{{X}_{1},{X}_{2},\cdots {X}_{T}\}$ |

1: $conv\Leftarrow Conv(1\times 1,6M,1)$ |

2: $bn\Leftarrow BatchNomalization\left(dwc\right)$ |

3: $bn\Leftarrow BatchNormalization\left(bn\right)$ |

4: $sw\Leftarrow Swish\left(bn\right)$ |

5: $se\Leftarrow SE($R = 4$,sw)$ |

6: $conv\Leftarrow Conv(1\times 1,B,1,se)$ |

7: $bn\Leftarrow BatchNormalization\left(conv\right)$ |

8: return (h/s × w/s, B = bn) |

Algorithm 3: EFFICIENTNET-XGBOOST() |

Ensure:$\mathbf{weights}\Leftarrow Imagenetweights$ |

Ensure:$\mathbf{biases}\Leftarrow ImagenetBias$ |

Ensure:$\mathbf{input}\Leftarrow (48,48,3)$, T is total images. |

1: begin: |

for i in range(0, T) do |

begin: |

{ conv<-- Conv(3 × 3, image) |

bn <-- BatchNormalization(conv) |

sw <-- bn * sigmoid(bn)} |

end; |

2: $mbc1\Leftarrow MBConv1(3\times 3,B,S,sw)$ 16 rounds |

for i in range(0, 2): |

mbc6 <-- MBConv6(3 × 3, B, S,mbc1) |

for i in range(0, 2): |

mbc6 <-- MBConv6(5 × 5, B, S,mbc6) |

for i in range(0, 3): |

mbc6 <-- MBConv6(3 × 3, B, S,mbc6) |

for i in range(0, 6): |

mbc6 <-- MBConv6(5 × 5, B, S,mbc6) |

3: $mbc6\Leftarrow MBConv6(3\times 3,B,S)\left(mbc6\right)$ |

4: $conv1\Leftarrow Conv(1\times 1,M,S)\left(mbc6\right)$ Fully Connected Layer |

5: $pool\Leftarrow MaxPool2D($pool_size = [1, 1],padding = ’valid’,S = 2) |

6: $d\Leftarrow Dropout(0.5,pool)$ |

7: $de\Leftarrow Dense($N = 1024$,d)$ |

for i in [feature\_maps]: |

read i: |

begin |

{ train\_y = train[$neurons=1024$] |

train\_x = train.drop[$neurons=1024$] |

dataset = xgboost.Dmatrix(train\_y, train\_x) } |

end: |

8: PARAMS:$max\_depth=7,eta=0.2,nu{m}_{c}lasses=7,objective=softmax$ |

9: $Xg\Leftarrow XGBOOST.train(params,dataset,$num_boost_round = 200) |

10: $Yhat\Leftarrow Xg.predict(x\_test)$ |

11: $score\Leftarrow accuracy\_score(test\_y,Yhat)$ |

12: End |

13: $Output::score$ |

#### 4.2. Experimental Setup

**Model Training and Evaluation:**The model execution for training and testing was performed on Google Colab Cloud Platform. Tensorflow 2.6 was used on Python 3.7, and the GPU was a Tesla P100-PCIE-16GB. The CPU is an Intel(R) Xeon(R) CPU running at 2.20 GHz. The researchers can reuse the weights learned and unfreeze some layers as per requirement to perform training, thanks to TL, helped for developing FER system. We have used Adam optimizer an adaptive learning rate method for training. The batch size considered is 32 with epochs ranging from 100–150. The parameters used are shown in Table 2.

#### 4.3. Datasets

#### 4.3.1. CK+

#### 4.3.2. KDEF

#### 4.3.3. JAFFE

#### 4.3.4. FER2013

#### 4.4. Data Pre-Processing

## 5. Results and Discussion

#### 5.1. Training and Validation

#### 5.2. Analysis Using Confusion Matrix

#### 5.3. Analysis of Classification Performance

#### 5.4. Feature Maps

#### 5.5. Receiver Operating Characteristic

#### 5.6. Comparison of Results with Other Works

## 6. Conclusions and Future Scope

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Akhand, M.; Roy, S.; Siddique, N.; Kamal, M.A.S.; Shimamura, T. Facial emotion recognition using transfer learning in the deep CNN. Electronics
**2021**, 10, 1036. [Google Scholar] [CrossRef] - Minaee, S.; Minaei, M.; Abdolrashidi, A. Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors
**2021**, 21, 3046. [Google Scholar] [CrossRef] [PubMed] - Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.H.; et al. Challenges in representation learning: A report on three machine learning contests. In Proceedings of the International Conference on Neural Information Processing, Daegu, Republic of Korea, 3–7 November 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 117–124. [Google Scholar]
- Pons, G.; Masip, D. Supervised committee of convolutional neural networks in automated facial expression analysis. IEEE Trans. Affect. Comput.
**2017**, 9, 343–350. [Google Scholar] [CrossRef] - Wen, G.; Hou, Z.; Li, H.; Li, D.; Jiang, L.; Xun, E. Ensemble of deep neural networks with probability-based fusion for facial expression recognition. Cogn. Comput.
**2017**, 9, 597–610. [Google Scholar] [CrossRef] - Jabid, T.; Kabir, M.H.; Chae, O. Robust facial expression recognition based on local directional pattern. ETRI J.
**2010**, 32, 784–794. [Google Scholar] [CrossRef] - Mahendran, A.; Vedaldi, A. Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis.
**2016**, 120, 233–255. [Google Scholar] [CrossRef] - Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process.
**2020**, 29, 4057–4069. [Google Scholar] [CrossRef] - Simonyan, K.; Vedaldi, A.; Zisserman, A. Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell.
**2014**, 36, 1573–1585. [Google Scholar] [CrossRef] - Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Yao, T.; Qu, C.; Liu, Q.; Deng, R.; Tian, Y.; Xu, J.; Jha, A.; Bao, S.; Zhao, M.; Fogo, A.B.; et al. Compound figure separation of biomedical images with side loss. In Deep Generative Models, and Data Augmentation, Labelling, and Imperfections; Springer: Berlin/Heidelberg, Germany, 2021; pp. 173–183. [Google Scholar]
- Zhao, M.; Jha, A.; Liu, Q.; Millis, B.A.; Mahadevan-Jansen, A.; Lu, L.; Landman, B.A.; Tyska, M.J.; Huo, Y. Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and tracking. Med. Image Anal.
**2021**, 71, 102048. [Google Scholar] [CrossRef] - Jin, B.; Cruz, L.; Gonçalves, N. Pseudo RGB-D Face Recognition. IEEE Sens. J.
**2022**, 22, 21780–21794. [Google Scholar] [CrossRef] - Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst.
**2014**, 27, 3320–3328. [Google Scholar] - Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Feng, X.; Pietikäinen, M.; Hadid, A. Facial expression recognition based on local binary patterns. Pattern Recognit. Image Anal.
**2007**, 17, 592–598. [Google Scholar] [CrossRef] - Liew, C.F.; Yairi, T. Facial expression recognition and analysis: A comparison study of feature descriptors. IPSJ Trans. Comput. Vis. Appl.
**2015**, 7, 104–120. [Google Scholar] [CrossRef] - Zhao, X.; Shi, X.; Zhang, S. Facial expression recognition via deep learning. IETE Tech. Rev.
**2015**, 32, 347–355. [Google Scholar] [CrossRef] - Mollahosseini, A.; Chan, D.; Mahoor, M.H. Going deeper in facial expression recognition using deep neural networks. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–10. [Google Scholar]
- Shima, Y.; Omori, Y. Image augmentation for classifying facial expression images by using deep neural network pre-trained with object image database. In Proceedings of the 3rd International Conference on Robotics, Control and Automation, Chengdu, China, 19–22 July 2018; pp. 140–146. [Google Scholar]
- Saeed, S.; Baber, J.; Bakhtyar, M.; Ullah, I.; Sheikh, N.; Dad, I.; Sanjrani, A.A. Empirical evaluation of svm for facial expression recognition. Int. J. Adv. Comput. Sci. Appl.
**2018**, 9. [Google Scholar] [CrossRef] - Sun, N.; Li, Q.; Huan, R.; Liu, J.; Han, G. Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recognit. Lett.
**2019**, 119, 49–61. [Google Scholar] [CrossRef] - Goyani, M.M.; Patel, N.M. Multi-level haar wavelet based facial expression recognition using logistic regression. Int. J. Next Gener. Comput.
**2018**, 10, 131–151. [Google Scholar] [CrossRef] - Li, K.; Jin, Y.; Akram, M.W.; Han, R.; Chen, J. Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis. Comput.
**2020**, 36, 391–404. [Google Scholar] [CrossRef] - Shi, C.; Tan, C.; Wang, L. A facial expression recognition method based on a multibranch cross-connection convolutional neural network. IEEE Access
**2021**, 9, 39255–39274. [Google Scholar] [CrossRef] - Aouayeb, M.; Hamidouche, W.; Soladie, C.; Kpalma, K.; Seguier, R. Learning vision transformer with squeeze and excitation for facial expression recognition. arXiv
**2021**, arXiv:2107.03107. [Google Scholar] - Happy, S.; Routray, A. Automatic facial expression recognition using features of salient facial patches. IEEE Trans. Affect. Comput.
**2014**, 6, 1–12. [Google Scholar] [CrossRef] - Alshamsi, H.; Kepuska, V.M.H. Real time automated facial expression recognition app development on smart phones. In Proceedings of the 8th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 3–5 October 2017; pp. 384–392. [Google Scholar]
- Wang, K.; Peng, X.; Yang, J.; Lu, S.; Qiao, Y. Suppressing uncertainties for large-scale facial expression recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6897–6906. [Google Scholar]
- Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1717–1724. [Google Scholar]
- Jain, N.; Kumar, S.; Kumar, A.; Shamsolmoali, P.; Zareapoor, M. Hybrid deep neural networks for face emotion recognition. Pattern Recognit. Lett.
**2018**, 115, 101–106. [Google Scholar] [CrossRef] - Yang, B.; Cao, J.; Ni, R.; Zhang, Y. Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access
**2017**, 6, 4630–4640. [Google Scholar] [CrossRef] - Sun, X.; Lv, M. Facial expression recognition based on a hybrid model combining deep and shallow features. Cogn. Comput.
**2019**, 11, 587–597. [Google Scholar] [CrossRef] - Gan, Y.; Chen, J.; Yang, Z.; Xu, L. Multiple attention network for facial expression recognition. IEEE Access
**2020**, 8, 7383–7393. [Google Scholar] [CrossRef] - Zhang, H.; Huang, B.; Tian, G. Facial expression recognition based on deep convolution long short-term memory networks of double-channel weighted mixture. Pattern Recognit. Lett.
**2020**, 131, 128–134. [Google Scholar] [CrossRef]

**Figure 8.**(

**a**) Training & Validation accuracy on CK+ dataset (

**b**) Training & Validation loss on CK+ dataset.

Stage | Operator | Resolution | Output Features | Layers |
---|---|---|---|---|

1 | Conv 3 × 3 | 224 × 224 | 32 | 1 |

2 | $\mathcal{MB}$Conv1, k3 × 3 | 112 × 112 | 16 | 1 |

3 | $\mathcal{MB}$Conv6, k3 × 3 | 112 × 112 | 24 | 2 |

4 | $\mathcal{MB}$Conv6, k5 × 5 | 56 × 56 | 40 | 2 |

5 | $\mathcal{MB}$Conv6, k3 × 3 | 28 × 28 | 80 | 3 |

6 | $\mathcal{MB}$Conv6, k5 × 5 | 14 × 14 | 112 | 3 |

7 | $\mathcal{MB}$Conv6, k5 × 5 | 14 × 15 | 192 | 4 |

8 | $\mathcal{MB}$Conv6, k3 × 3 | 7 × 7 | 320 | 1 |

9 | Conv 1 × 1 & Pooling & FC | 7 × 7 | 1280 | 1 |

Parameter | Value |
---|---|

Epochs | 100–150 |

Batch Size | 32 or 64 |

Dropout Rate | 0.001 |

Optimizer | Adam |

Loss Function | Categorical Cross Entropy |

Early Stop | Enabled |

S.No | Name of the Data Set | Emotions | No. of Images | ||||||
---|---|---|---|---|---|---|---|---|---|

Anger | Fear | Happy | Surprise | Disgust | Sadness | Neutral | |||

1 | CK+48 | 75 | 207 | 249 | 77 | 84 | 135 | NA | 927 |

2 | JAFFEE | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 180 |

3 | KDEF | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 490 |

4 | FER2013 | 5121 | 8989 | 4002 | 547 | 6077 | 4953 | 6198 | 35,887 |

Parameter | Value |
---|---|

Zoom | 0.15 |

Width Shift | 0.2 |

Range of Brightness | (0.6–1.2) |

Shear | 0.15 |

Height Shift | 0.2 |

Fill Mode | Nearest |

Parameter | Value |
---|---|

Max Depth | 7 |

eta | 0.2 |

Number of Classes | {6, 7} based on dataset |

Objective | softmax, softprob |

Eval_Metric | merror |

alpha | default |

gamma | default |

**Table 6.**Base model EfficientNet experimental results on CK+, FER2013, JAFFE and KDEF dataset (LR: learning Rate, Val-Loss: Validation Loss).

S.No | Dataset | LR | Val-Loss | Accuracy (%) | ||
---|---|---|---|---|---|---|

Train | Validation | Test | ||||

1. | CK+ | 2.499 × 10${}^{-4}$ | 0.1368 | 94.35 | 95.714 | 94.41 |

2. | FER2013 | 0.145 | - | 90.35 | 61.44 | 61.54 |

3. | JAFFE | - | 0.7315 | 98.44 | 98.44 | 97.67 |

4. | KDEF | - | 0.4512 | 96.54 | 94.15 | 93.74 |

Model Name | Accuracy (%) |
---|---|

E. et al., B. Yang, J. Cao, and B. Yang [32] | 97.02% |

X. Sun and M. Lv [33] | 94.82% |

M. Goyani and N. Patel [23] | 98.73% |

Gan, Y., Chen, J., Yang, Z., and Xu, L. [34] | 94.51% |

K. Li et al. [24] | 97.54% |

WMCNN-LSTM [35] | 97.50% |

N. Sun, Q. Li, et al. [22] | 98.38% |

MBCC-CNN [25] | 98.48% |

EfficientNet-XGBoost (Proposed Model) | 100% |

Model Name | Accuracy (%) |
---|---|

Alshami el al [28] | 90.80% |

Ruiz-Garcia et al [18] | 92.52% |

Jain et al. [31] | 94.91 |

EfficientNet-XGBoost (Proposed) | 98.44% |

Model Name | Accuracy (%) |
---|---|

Aouayeb M, Hamidouche W et al. [26] | 94.83% |

E. al, B. Yang, J. Cao, and B. Yang [32] | 92.2% |

Minaee S, Minaei M, Abdolrashidi A [2] | 92.8% |

Happy SL [27] | 91.8% |

Alshame al at. [28] | 91.90% |

Zhao and Zhang [35] | 90.95% |

EfficientNet-XGBoost (Proposed) | 98.3% |

Model Name | Accuracy (%) |
---|---|

VGG-19 | 70.80% |

EfficientNet-B0 | 70.80% |

GoogleNet | 71.97% |

ResNet34 | 72.42% |

Inception V3 | 72.72 % |

Bam - ResNet 50 | 73.12% |

DenseNet121 | 73.16% |

ResNet152 | 73.16% |

EfficientNet-XGBoost (Proposed) | 72.54% |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Punuri, S.B.; Kuanar, S.K.; Kolhar, M.; Mishra, T.K.; Alameen, A.; Mohapatra, H.; Mishra, S.R.
Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning. *Mathematics* **2023**, *11*, 776.
https://doi.org/10.3390/math11030776

**AMA Style**

Punuri SB, Kuanar SK, Kolhar M, Mishra TK, Alameen A, Mohapatra H, Mishra SR.
Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning. *Mathematics*. 2023; 11(3):776.
https://doi.org/10.3390/math11030776

**Chicago/Turabian Style**

Punuri, Sudheer Babu, Sanjay Kumar Kuanar, Manjur Kolhar, Tusar Kanti Mishra, Abdalla Alameen, Hitesh Mohapatra, and Soumya Ranjan Mishra.
2023. "Efficient Net-XGBoost: An Implementation for Facial Emotion Recognition Using Transfer Learning" *Mathematics* 11, no. 3: 776.
https://doi.org/10.3390/math11030776