Augmenting Aquaculture Efficiency through Involutional Neural Networks and Self-Attention for Oplegnathus Punctatus Feeding Intensity Classification from Log Mel Spectrograms
Abstract
:Simple Summary
Abstract
1. Introduction
2. Literature Review
3. Motivation and Significant Contributions
- The utilization of Involutional Neural Network architecture for fish behavior analysis.
- We employ Log Mel Spectrograms converted from audio waveforms and subsequent feature extraction methods from Log Mel Spectrograms.
- Our proposed method prioritizes efficiency without compromising accuracy, making it suitable for real-time deployment in aquaculture environments.
4. Materials and Methods
4.1. Experimental Setup and Materials
4.1.1. Audio and Video Acquisition
4.1.2. Audio and Video Preprocessing
4.2. Proposed Approach
4.2.1. Log Mel Spectrogram
- is the Mel spectrogram magnitude at frequency f and time t,
- represents the filter bank response for the m-th Mel filter at frequency f in Mel filter bank ,
- denotes the magnitude spectrum of the audio signal in the m-th Mel filter at time t.
4.2.2. Discrete Wavelet Transform (DWT)
Algorithm 1 Denoising the Mel Spectrogram image using DWT |
Input: Mel spectrogram Apply 2D-DWT to and obtain coefficients and Level 3 Decomposition: for to do for k in the range of coefficients at level 3 do ▹ Remove high-frequency components end for end for Reconstruct denoised coefficients: Reconstruct synthesized image: Output: Denoised Mel spectrogram |
4.2.3. Gabor Filter
4.2.4. Local Binary Pattern
4.2.5. Laplacian High Pass Filter
4.2.6. Combination of Extracted Features
Algorithm 2 Preprocessing of the Log Mel Spectrogram image and resulting feature image |
Input: Mel spectrogram image Output: Concatenated processed image Step 1: Denoising with DWT Step 2: Filtering with Gabor filter Step 3: Operation with LBP Step 4: Filtering with Laplacian High Pass Filter (LHPF) Step 5: Concatenation Return |
4.3. Network Architecture
4.3.1. Involutional Layer with Self-Attention
4.3.2. Involutional Neural Network Architecture
- Layer 1: The first layer of the model consists of involution operations, as shown in Figure 10a. It has three output channels and uses a dynamic kernel of size . The stride is set to 1, and the reduction ratio is 2. Following the involution operation, a max-pooling layer with a pool size of is applied, and ReLU activation is used.
- Layer 2: Similar to the first layer, the second layer also uses involution operations with similar parameters. It uses a dynamic kernel and has three output channels. The reduction ratio is 2, and the stride is set to 1. ReLU activation comes after the involution operation, and a max-pooling layer with a pool size of is applied.
- Layer 3: The third layer continues with involution operations. It has three output channels and uses a dynamic kernel of size . The stride is set to 1, and the reduction ratio is 2. After the involution operation, a max-pooling layer with a pool size of is applied, followed by ReLU activation.
- Layer 4: The fourth layer reshapes the output of the third layer into a format suitable for further processing. It reshapes the output into ‘(height × width, 3)’.
- Layer 5: The fifth layer applies a self-attention mechanism to capture dependencies between different patches of the feature map as illustrated in Figure 10b. It uses an attention width of 15 and a sigmoid activation function.
- Layer 6: Following the self-attention layer, the output is flattened to prepare it for the fully connected layers.
- Layer 7: The sixth layer consists of a fully connected dense layer with 32 hidden units and ReLU activation.
- Layer 8: The final layer is another fully connected dense layer with three output units, corresponding to the number of classes in the dataset. It does not have an explicit activation function as it is followed by the softmax activation applied during model compilation for probabilistic classification.
4.4. Performance Evaluation Metrics
5. Results and Discussion
5.1. Results and Comparison
5.2. Involutional Neural Networks Compared with Convolutional Neural Networks
5.3. Potential Deployment of INN Model on Edge Devices
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Liao, J.; Zhang, X.; Kang, S.; Zhang, L.; Zhang, D.; Xu, Z.; Qin, Q.; Wei, J. Establishment and characterization of a brain tissue cell line from spotted knifejaw (Oplegnathus punctatus) and its susceptibility to several fish viruses. J. Fish Dis. 2023, 46, 767–777. [Google Scholar] [CrossRef]
- Boyd, C.E.; D’Abramo, L.R.; Glencross, B.D.; Huyben, D.C.; Juarez, L.M.; Lockwood, G.S.; McNevin, A.A.; Tacon, A.G.J.; Teletchea, F.; Tomasso, J.R., Jr.; et al. Achieving sustainable aquaculture: Historical and current perspectives and future needs and challenges. J. World Aquac. Soc. 2020, 51, 578–633. [Google Scholar] [CrossRef]
- Kandathil Radhakrishnan, D.; AkbarAli, I.; Schmidt, B.V.; John, E.M.; Sivanpillai, S.; Thazhakot Vasunambesan, S. Improvement of nutritional quality of live feed for aquaculture: An overview. Aquac. Res. 2020, 51, 1–17. [Google Scholar] [CrossRef]
- Wang, J.; Rongxing, H.; Han, T.; Zheng, P.; Xu, H.; Su, H.; Wang, Y. Dietary protein requirement of juvenile spotted knifejaw Oplegnathus punctatus. Aquac. Rep. 2021, 21, 100874. [Google Scholar] [CrossRef]
- Li, D.; Wang, Z.; Wu, S.; Miao, Z.; Du, L.; Duan, Y. Automatic recognition methods of fish feeding behavior in aquaculture: A review. Aquaculture 2020, 528, 735508. [Google Scholar] [CrossRef]
- Liu, Z.; Li, X.; Fan, L.; Lu, H.; Liu, L.; Liu, Y. Measuring feeding activity of fish in RAS using computer vision. Aquac. Eng. 2014, 60, 20–27. [Google Scholar] [CrossRef]
- Mushtaq, Z.; Su, S.F. Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl. Acoust. 2020, 167, 107389. [Google Scholar] [CrossRef]
- Mushtaq, Z.; Su, S.F.; Tran, Q.V. Spectral images based environmental sound classification using CNN with meaningful data augmentation. Appl. Acoust. 2021, 172, 107581. [Google Scholar] [CrossRef]
- Khan, A.A.; Raza, S.; Qureshi, M.F.; Mushtaq, Z.; Taha, M.; Amin, F. Deep Learning-Based Classification of Wheat Leaf Diseases for Edge Devices. In Proceedings of the 2023 2nd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE), Lahore, Pakistan, 27–29 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Singh, A.; Mushtaq, Z.; Abosaq, H.A.; Mursal, S.N.F.; Irfan, M.; Nowakowski, G. Enhancing Ransomware Attack Detection Using Transfer Learning and Deep Learning Ensemble Models on Cloud-Encrypted Data. Electronics 2023, 12, 3899. [Google Scholar] [CrossRef]
- Qamar, H.G.M.; Qureshi, M.F.; Mushtaq, Z.; Zubariah, Z.; Rehman, M.Z.U.; Samee, N.A.; Mahmoud, N.F.; Gu, Y.H.; Al-masni, M.A. EMG gesture signal analysis towards diagnosis of upper limb using dual-pathway convolutional neural network. Math. Biosci. Eng. 2024, 21, 5712–5734. [Google Scholar] [CrossRef]
- Qureshi, M.F.; Mushtaq, Z.; ur Rehman, M.Z.; Kamavuako, E.N. Spectral image-based multiday surface electromyography classification of hand motions using CNN for human–computer interaction. IEEE Sens. J. 2022, 22, 20676–20683. [Google Scholar] [CrossRef]
- Shahzad, A.; Mushtaq, A.; Sabeeh, A.Q.; Ghadi, Y.Y.; Mushtaq, Z.; Arif, S.; ur Rehman, M.Z.; Qureshi, M.F.; Jamil, F. Automated Uterine Fibroids Detection in Ultrasound Images Using Deep Convolutional Neural Networks. Healthcare 2023, 11, 1493. [Google Scholar] [CrossRef] [PubMed]
- Afshan, N.; Mushtaq, Z.; Alamri, F.S.; Qureshi, M.F.; Khan, N.A.; Siddique, I. Efficient thyroid disorder identification with weighted voting ensemble of super learners by using adaptive synthetic sampling technique. Aims Math. 2023, 8, 24274–24309. [Google Scholar] [CrossRef]
- Khalil, S.; Nawaz, U.; Zubariah; Mushtaq, Z.; Arif, S.; ur Rehman, M.Z.; Qureshi, M.F.; Malik, A.; Aleid, A.; Alhussaini, K. Enhancing Ductal Carcinoma Classification Using Transfer Learning with 3D U-Net Models in Breast Cancer Imaging. Appl. Sci. 2023, 13, 4255. [Google Scholar] [CrossRef]
- Du, Z.; Cui, M.; Wang, Q.; Liu, X.; Xu, X.; Bai, Z.; Sun, C.; Wang, B.; Wang, S.; Li, D. Feeding intensity assessment of aquaculture fish using Mel Spectrogram and deep learning algorithms. Aquac. Eng. 2023, 102, 102345. [Google Scholar] [CrossRef]
- Føre, M.; Frank, K.; Norton, T.; Svendsen, E.; Alfredsen, J.A.; Dempster, T.; Eguiraun, H.; Watson, W.; Stahl, A.; Sunde, L.M.; et al. Precision fish farming: A new framework to improve production in aquaculture. Biosyst. Eng. 2018, 173, 176–193. [Google Scholar] [CrossRef]
- O’Donncha, F.; Stockwell, C.L.; Planellas, S.R.; Micallef, G.; Palmes, P.; Webb, C.; Filgueira, R.; Grant, J. Data Driven Insight Into Fish Behaviour and Their Use for Precision Aquaculture. Front. Anim. Sci. 2021, 2, 695054. [Google Scholar] [CrossRef]
- Cui, M.; Liu, X.; Zhao, J.; Sun, J.; Lian, G.; Chen, T.; Plumbley, M.D.; Li, D.; Wang, W. Fish Feeding Intensity Assessment in Aquaculture: A New Audio Dataset AFFIA3K and a Deep Learning Algorithm. In Proceedings of the 2022 IEEE 32nd International Workshop on Machine Learning for Signal Processing (MLSP), Xi’an, China, 22–25 August 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Ubina, N.; Cheng, S.C.; Chang, C.C.; Chen, H.Y. Evaluating fish feeding intensity in aquaculture with convolutional neural networks. Aquac. Eng. 2021, 94, 102178. [Google Scholar] [CrossRef]
- Zhou, C.; Xu, D.; Chen, L.; Zhang, S.; Sun, C.; Yang, X.; Wang, Y. Evaluation of fish feeding intensity in aquaculture using a convolutional neural network and machine vision. Aquaculture 2019, 507, 457–465. [Google Scholar] [CrossRef]
- Du, Z.; Xu, X.; Bai, Z.; Liu, X.; Hu, Y.; Li, W.; Wang, C.; Li, D. Feature fusion strategy and improved GhostNet for accurate recognition of fish feeding behavior. Comput. Electron. Agric. 2023, 214, 108310. [Google Scholar] [CrossRef]
- Zhang, Y.; Xu, C.; Du, R.; Kong, Q.; Li, D.; Liu, C. MSIF-MobileNetV3: An improved MobileNetV3 based on multi-scale information fusion for fish feeding behavior analysis. Aquac. Eng. 2023, 102, 102338. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, J.; Li, B.; Liu, Y.; Zhang, H.; Duan, Q. A MobileNetV2-SENet-based method for identifying fish school feeding behavior. Aquac. Eng. 2022, 99, 102288. [Google Scholar] [CrossRef]
- Yang, L.; Yu, H.; Cheng, Y.; Mei, S.; Duan, Y.; Li, D.; Chen, Y. A dual attention network based on efficientNet-B2 for short-term fish school feeding behavior analysis in aquaculture. Comput. Electron. Agric. 2021, 187, 106316. [Google Scholar] [CrossRef]
- Feng, S.; Yang, X.; Liu, Y.; Zhao, Z.; Liu, J.; Yan, Y.; Zhou, C. Fish feeding intensity quantification using machine vision and a lightweight 3D ResNet-GloRe network. Aquac. Eng. 2022, 98, 102244. [Google Scholar] [CrossRef]
- Zeng, Y.; Yang, X.; Pan, L.; Zhu, W.; Wang, D.; Zhao, Z.; Liu, J.; Sun, C.; Zhou, C. Fish school feeding behavior quantification using acoustic signal and improved Swin Transformer. Comput. Electron. Agric. 2023, 204, 107580. [Google Scholar] [CrossRef]
- Kong, Q.; Du, R.; Duan, Q.; Zhang, Y.; Chen, Y.; Li, D.; Xu, C.; Li, W.; Liu, C. A recurrent network based on active learning for the assessment of fish feeding status. Comput. Electron. Agric. 2022, 198, 106979. [Google Scholar] [CrossRef]
- Hu, W.C.; Chen, L.B.; Huang, B.K.; Lin, H.M. A Computer Vision-Based Intelligent Fish Feeding System Using Deep Learning Techniques for Aquaculture. IEEE Sens. J. 2022, 22, 7185–7194. [Google Scholar] [CrossRef]
- Zheng, K.; Yang, R.; Li, R.; Guo, P.; Yang, L.; Qin, H. A spatiotemporal attention network-based analysis of golden pompano school feeding behavior in an aquaculture vessel. Comput. Electron. Agric. 2023, 205, 107610. [Google Scholar] [CrossRef]
- Jayasundara, J.M.V.D.B.; Ramanayake, R.M.L.S.; Senarath, H.M.N.B.; Herath, H.M.S.L.; Godaliyadda, G.M.R.I.; Ekanayake, M.P.B.; Herath, H.M.V.R.; Ariyawansa, S. Deep learning for automated fish grading. J. Agric. Food Res. 2023, 14, 100711. [Google Scholar] [CrossRef]
- Irfan, M.; Mushtaq, Z.; Khan, N.A.; Althobiani, F.; Mursal, S.N.F.; Rahman, S.; Magzoub, M.A.; Latif, M.A.; Yousufzai, I.K. Improving Bearing Fault Identification by Using Novel Hybrid Involution-Convolution Feature Extraction With Adversarial Noise Injection in Conditional GANs. IEEE Access 2023, 11, 118253–118267. [Google Scholar] [CrossRef]
- Qureshi, M.F.; Mushtaq, Z.; Rehman, M.Z.U.; Kamavuako, E.N. E2CNN: An Efficient Concatenated CNN for Classification of Surface EMG Extracted From Upper Limb. IEEE Sens. J. 2023, 23, 8989–8996. [Google Scholar] [CrossRef]
- Sigurdsson, S.; Petersen, K.B.; Lehn-Schiøler, T. Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music. In Proceedings of the 7th International Conference on Music Information Retrieval, Victoria, BC, Canada, 8–12 October 2006; pp. 286–289. [Google Scholar]
- Othman, G.; Zeebaree, D.Q. The Applications of Discrete Wavelet Transform in Image Processing: A Review. J. Soft Comput. Data Min. 2020, 1, 31–43. [Google Scholar]
- Hammouche, R.; Attia, A.; Akhrouf, S.; Akhtar, Z. Gabor filter bank with deep autoencoder based face recognition system. Expert Syst. Appl. 2022, 197, 116743. [Google Scholar] [CrossRef]
- Vu, H.N.; Nguyen, M.H.; Pham, C. Masked face recognition with convolutional neural networks and local binary patterns. Appl. Intell. 2022, 52, 5497–5512. [Google Scholar] [CrossRef] [PubMed]
- Fu, G.; Zhao, P.; Bian, Y. $p$-Laplacian Based Graph Neural Networks. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 6878–6917. [Google Scholar]
- Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
- Jain, A.K.; Farrokhnia, F. Unsupervised texture segmentation using Gabor filters. Pattern Recognit. 1991, 24, 1167–1186. [Google Scholar] [CrossRef]
- Ojala, T.; Pietikäinen, M.; Harwood, D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. 1996, 29, 51–59. [Google Scholar] [CrossRef]
- Gonzalez, R.C. Digital Image Processing; Pearson Education: Chennai, India, 2009. [Google Scholar]
Reference | Year | Technique/Model | Classes | Accuracy |
---|---|---|---|---|
[16] | 2023 | MobileNetV3-SBSC with Mel Spectrogram | Strong, Medium, and None | 85.9% |
[19] | 2022 | CNN with Mel Spectrogram | None, Weak, Medium, and Strong | 74% (mAP) |
[20] | 2021 | 3D CNN with Optical Flow Frames | None, Weak, Medium, and Strong | 95% |
[21] | 2019 | CNN with Machine Vision | - | 90% |
[22] | 2024 | LC-GhostNet with Feature Fusion Strategy | Strong, Medium, Weak, and None | 97.941% |
[23] | 2023 | MSIF-MobileNetV3 | - | 96.4% |
[24] | 2022 | MobileNetV2-SENet | - | 97.76% |
[25] | 2021 | Dual attention network with Efficientnet-B2 | - | 89.56% |
[26] | 2022 | 3D ResNet-GloRe network | - | 92.68% |
[27] | 2023 | ASST with improved Swin Transformer | Strong, Medium, Weak, and None | 96.16% |
[28] | 2022 | VGG16 with Active Learning | - | 98% |
[29] | 2022 | Computer Vision for Fish Feeding | - | 93.2% |
[30] | 2023 | Spatiotemporal Attention Network (STAN) | - | 97.97% |
[31] | 2023 | FishNET-S and FishNET-T | - | 84.1%, 68.3% |
Feeding Activity | Descriptions |
---|---|
Strong | Fish move freely between food items and consume all the available food. |
Medium | Fish move to take food but return to their original positions. |
None | Fish do not respond to food. |
Layer | Output Shape | Param # |
---|---|---|
InputLayer | [(None, 224, 224, 3)] | 0 |
Rescaling | (None, 224, 224, 3) | 0 |
Involution | ((None, 224, 224, 3), | 26 |
(None, 224, 224, 9, 1, 1)) | ||
ReLU | (None, 224, 224, 3) | 0 |
MaxPooling2D | (None, 112, 112, 3) | 0 |
Involution | ((None, 112, 112, 3), | 26 |
(None, 112, 112, 9, 1, 1)) | ||
ReLU | (None, 112, 112, 3) | 0 |
MaxPooling2D | (None, 56, 56, 3) | 0 |
Involution | ((None, 56, 56, 3), | 26 |
(None, 56, 56, 9, 1, 1)) | ||
ReLU | (None, 56, 56, 3) | 0 |
MaxPooling2D | (None, 28, 28, 3) | 0 |
Reshape | (None, 784, 3) | 0 |
SeqSelfAttention | (None, 784, 3) | 257 |
Flatten | (None, 2352) | 0 |
Dense | (None, 32) | 75,296 |
Dense | (None, 32) | 1056 |
Dense | (None, 3) | 99 |
Trainable params: 76,780 (299.92 KB) | ||
Non-trainable params: 6 (24.00 Byte) | ||
Total params: 76,786 (299.95 KB) |
Dataset | Accuracy | MWP | MWR | F1-Score |
---|---|---|---|---|
3-s | 97.11% | 97.10 | 97.10% | 97.10% |
4-s | 92.73% | 92.74 | 92.74% | 92.74% |
5-s | 90.56% | 90.53 | 90.53% | 90.54% |
Model | Parameters | Size | Accuracy | MWP | MWR | F1-Score | Training Time (s) | Inference Time (s) |
---|---|---|---|---|---|---|---|---|
VGG16 | 16,320,579 | 62.26 MB | 95.84% | 95.85% | 95.85% | 95.84% | 509.127 | 2.551 |
VGG-19 | 21,630,275 | 82.51 MB | 90.56% | 90.53% | 90.53% | 90.53% | 591.452 | 2.337 |
ResNet50 | 30,010,499 | 114.48 MB | 97.86% | 97.86% | 97.88% | 97.88% | 424.957 | 1.711 |
Xception | 27,284,267 | 104.08 MB | 93.90% | 93.89% | 93.92% | 93.92% | 550.481 | 2.232 |
EfficientNet-B0 | 8,063,910 | 30.76 MB | 95.40% | 95.39% | 95.39% | 95.39% | 320.851 | 1.236 |
InceptionNetV3 | 25,079,843 | 95.67 MB | 93.26% | 93.21% | 93.22% | 93.21% | 340.652 | 1.403 |
MobileNet-V2 | 6,272,323 | 23.93 MB | 91.97% | 91.97% | 91.97% | 91.99% | 216.605 | 0.884 |
sProposed INN | 76,786 | 299.95 KB | 97.11% | 97.10% | 97.10% | 97.10% | 129.401 | 0.553 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iqbal, U.; Li, D.; Du, Z.; Akhter, M.; Mushtaq, Z.; Qureshi, M.F.; Rehman, H.A.U. Augmenting Aquaculture Efficiency through Involutional Neural Networks and Self-Attention for Oplegnathus Punctatus Feeding Intensity Classification from Log Mel Spectrograms. Animals 2024, 14, 1690. https://doi.org/10.3390/ani14111690
Iqbal U, Li D, Du Z, Akhter M, Mushtaq Z, Qureshi MF, Rehman HAU. Augmenting Aquaculture Efficiency through Involutional Neural Networks and Self-Attention for Oplegnathus Punctatus Feeding Intensity Classification from Log Mel Spectrograms. Animals. 2024; 14(11):1690. https://doi.org/10.3390/ani14111690
Chicago/Turabian StyleIqbal, Usama, Daoliang Li, Zhuangzhuang Du, Muhammad Akhter, Zohaib Mushtaq, Muhammad Farrukh Qureshi, and Hafiz Abbad Ur Rehman. 2024. "Augmenting Aquaculture Efficiency through Involutional Neural Networks and Self-Attention for Oplegnathus Punctatus Feeding Intensity Classification from Log Mel Spectrograms" Animals 14, no. 11: 1690. https://doi.org/10.3390/ani14111690
APA StyleIqbal, U., Li, D., Du, Z., Akhter, M., Mushtaq, Z., Qureshi, M. F., & Rehman, H. A. U. (2024). Augmenting Aquaculture Efficiency through Involutional Neural Networks and Self-Attention for Oplegnathus Punctatus Feeding Intensity Classification from Log Mel Spectrograms. Animals, 14(11), 1690. https://doi.org/10.3390/ani14111690