Design of Multi-Task Parallel Model Based on Fuzzy Neural Networks and Joint Gradient Descent Algorithm
Abstract
1. Introduction
- (1)
- A multi-task parallel model with FNNs is established, including a task-sharing structure and a task-independent structure. The shared structure is the sharing of some neurons in the hidden layer to obtain the relevant information between tasks. The independent structure is composed of independent neurons for holding the individual task-specific characteristics.
- (2)
- A joint gradient descent algorithm is proposed to update the parameters of the MTPM. In this algorithm, a gradient correction discriminant strategy is designed to resolve the task gradient direction conflict problem, which utilizes the gradient cosine value to evaluate the difference in the gradient direction. Then it guides the optimization of the shared parameters and the specific parameters through gradient correction.
- (3)
- An adaptive learning rate strategy is proposed to separate the GLR from the SLR to balance the multi-task training speed. The GLR is related to the task-shared parameter gradient for updating the shared parameters, while the SLR depends on the task error variation associated with training results.
2. Preliminaries
2.1. MTL
2.2. Fuzzy Neural Network
3. Multi-Task Modeling Approach Based on JGDA
3.1. MTPM
3.2. JGDA
3.2.1. Multi-Task Parameter Update Method
3.2.2. Joint Gradient Method
3.3. Adaptive Learning Rate Strategy
Algorithm 1. The proposed JGDA. | |
Input: K: maximum iterations. θ: shared parameters θs, tth task-specific parameters θt. 1. Initialization Initialize the parameters θ(c,σ,w),the GLR η and the SLR ηt. 2. Parameters learning process for k = 1:K for o = 1:O for p = 1:P | |
Calculate the outputs of RBF layer. | %Equations (11) and (12) |
Own the output of normalized layer. | %Equations (13) and (14) |
Calculate the output of output layer. | %Equation (15) |
Evaluate the error. | |
end | |
Calculate task-specific parameter gradient. | %Equation (18) |
Calculate shared parameter gradient. | %Equation (23) |
for u,v ∈ T | |
if gu × gv< 0 | |
Calculate interference direction Ir. | |
Calculate the gradient difference Δgu,v. | %Equation (32) |
Obtain the modified gradient gu′. | %Equation (33) |
end if | |
Calculate weight of tasks αu, αv. | %Equation (38) |
end | |
end | |
Calculate the GLR. | %Equation (39) |
Calculate the SLR. | %Equation (46) |
end | |
Calculate shared parameter update. | %Equation (44) |
Calculate task-specific parameter update. | %Equation (45) |
end |
4. Convergence Discussion
5. Experimental Studies
5.1. The Experimental Results of the Wind Energy Prediction
5.2. The Experimental Results of Ship Fuel Consumption Prediction
5.3. The Experimental Results of Multi-Site Quality Prediction
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, Y.; Tang, X.; Zhang, X.; Ma, J.; Liu, F.; Jia, X.; Jiao, L. Semi-supervised multiscale dynamic graph convolution network for hyperspectral image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 6806–6820. [Google Scholar] [CrossRef]
- Hong, Q.; Lin, L.; Li, Z.; Li, Q.; Yao, J.; Wu, Q.; Liu, K.; Tian, J. A distance transformation deep forest framework with hybrid-feature fusion for cxr image classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 14633–14644. [Google Scholar] [CrossRef]
- Abdar, M.; Fahami, M.A.; Rundo, L.; Radeva, P.; Frangi, A.F.; Acharya, U.R.; Khosravi, A.; Lam, H.-K.; Jung, A.; Nahavandi, S. Hercules: Deep hierarchical attentive multilevel fusion model with uncertainty quantification for medical image classification. IEEE Trans. Ind. Inform. 2022, 19, 274–285. [Google Scholar] [CrossRef]
- Shahamiri, S.R. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 852–861. [Google Scholar] [CrossRef]
- Kim, M.; Kim, H.-I.; Ro, Y.M. Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1042–1055. [Google Scholar] [CrossRef] [PubMed]
- Jalayer, R.; Jalayer, M.; Baniasadi, A. A Review on Sound Source Localization in Robotics: Focusing on Deep Learning Methods. Appl. Sci. 2025, 15, 9354. [Google Scholar] [CrossRef]
- Gómez-Sirvent, J.L.; de la Rosa, F.L.; Sánchez-Reolid, D.; Sánchez-Reolid, R.; Fernández-Caballero, A. Small Language Models for Speech Emotion Recognition in Text and Audio Modalities. Appl. Sci. 2025, 15, 7730. [Google Scholar] [CrossRef]
- He, X.; Tang, Y.; Yu, B.; Li, S.; Ren, Y. Joint Extraction of Biomedical Events Based on Dynamic Path Planning Strategy and Hybrid Neural Network. IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 21, 2064–2075. [Google Scholar] [CrossRef]
- Lan, W.; Tang, Z.; Liu, M.; Chen, Q.; Peng, W.; Chen, Y.-P.P.; Pan, Y. The Large Language Models on Biomedical Data Analysis: A Survey. IEEE J. Biomed. Health Inform. 2025, 29, 4486–4497. [Google Scholar] [CrossRef]
- Zhang, L.; Shi, Y.; Chang, Y.-C.; Lin, C.-T. Robust Fuzzy Neural Network with an Adaptive Inference Engine. IEEE Trans. Cybern. 2024, 54, 3275–3285. [Google Scholar] [CrossRef]
- Wu, D.; Lisser, A. Parallel solution of nonlinear projection equations in a multitask learning framework. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 3490–3503. [Google Scholar] [CrossRef]
- Liao, Q.; Chai, H.; Han, H.; Zhang, X.; Wang, X.; Xia, W.; Ding, Y. An integrated multi-task model for fake news detection. IEEE Trans. Knowl. Data Eng. 2021, 34, 5154–5165. [Google Scholar] [CrossRef]
- Ding, H.; Qiao, J.; Huang, W.; Yu, T. Cooperative event-triggered fuzzy-neural multivariable control with multitask learning for municipal solid waste incineration process. IEEE Trans. Ind. Inform. 2023, 20, 765–774. [Google Scholar] [CrossRef]
- Zhao, L.; Sun, Q.; Ye, J.; Chen, F.; Lu, C.-T.; Ramakrishnan, N. feature constrained multi-task learning models for spatiotemporal event forecasting. IEEE Trans. Knowl. Data Eng. 2017, 29, 1059–1072. [Google Scholar] [CrossRef]
- Chen, J.; Li, Q.; Liu, F.; Wen, Y. M3T-LM: A multi-modal multi-task learning model for jointly predicting patient length of stay and mortality. Comput. Biol. Med. 2024, 183, 109237. [Google Scholar] [CrossRef]
- Xu, X.; Yoneda, M. Multitask air-quality prediction based on LSTM-autoencoder model. IEEE Trans. Cybern. 2019, 51, 2577–2586. [Google Scholar] [CrossRef]
- Tian, T.; Raj, A.; Xavier, B.M.; Zhang, Y.; Wu, F.-Y.; Yang, K. A multi-task learning framework for underwater acoustic channel prediction: Performance analysis on real-world data. IEEE Trans. Wirel. Commun. 2024, 23, 15930–15944. [Google Scholar] [CrossRef]
- Wang, Y.; Qin, B.; Liu, K.; Shen, M.; Niu, M.; Han, L. A new multitask learning method for tool wear condition and part surface quality prediction. IEEE Trans. Ind. Inform. 2020, 17, 6023–6033. [Google Scholar] [CrossRef]
- Li, G.; Hoi, S.C.H.; Chang, K.; Liu, W.; Jain, R. Collaborative online multitask learning. IEEE Trans. Knowl. Data Eng. 2013, 26, 1866–1876. [Google Scholar] [CrossRef]
- Wang, J.; Lin, L.; Teng, Z.; Zhang, Y. Multitask learning based on improved uncertainty weighted loss for multi-parameter meteorological data prediction. Atmosphere 2022, 13, 989. [Google Scholar] [CrossRef]
- Wu, D.; Tan, X. Multitasking genetic algorithm (MTGA) for fuzzy system optimization. IEEE Trans. Fuzzy Syst. 2020, 28, 1050–1061. [Google Scholar] [CrossRef]
- Chen, H.; Liu, H.-L.; Gu, F.; Tan, K.C. A multiobjective multitask optimization algorithm using transfer rank. IEEE Trans. Evol. Comput. 2022, 27, 237–250. [Google Scholar] [CrossRef]
- Mao, Y.; Wang, Z.; Liu, W.; Lin, X.; Hu, W. Task variance regularized multi-task learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 8615–8629. [Google Scholar] [CrossRef]
- Liu, Q.; Chu, M.; Thuerey, N. Config: Towards conflict-free training of physics informed neural networks. arXiv 2024, arXiv:2408.11104. [Google Scholar] [CrossRef]
- Huang, H.; Huang, X.; Ding, W.; Zhang, S.; Pang, J. Optimization of electric vehicle sound package based on LSTM with an adaptive learning rate forest and multiple-level multiple-object method. Mech. Syst. Signal Process. 2023, 187, 109932. [Google Scholar] [CrossRef]
- Luo, L.; Xiong, Y.; Liu, Y.; Sun, X. Adaptive gradient methods with dynamic bound of learning rate. arXiv 2019, arXiv:1902.09843. [Google Scholar] [CrossRef]
- Zhai, X.; Qiao, F.; Ma, Y.; Lu, H. A novel fault diagnosis method under dynamic working conditions based on a CNN with an adaptive learning rate. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 2011, 12, 2121–2159. Available online: http://jmlr.org/papers/v12/duchi11a.html (accessed on 1 July 2011).
- Jeong, J.J.; Koo, G. AdaLo: Adaptive learning rate optimizer with loss for classification. Inf. Sci. 2025, 690, 121607. [Google Scholar] [CrossRef]
- Ruder, S. An overview of multi-task learning in deep neural networks. arXiv 2017, arXiv:1706.05098. [Google Scholar] [CrossRef]
- Zhang, J.; Zheng, Y.; Sun, J.; Qi, D. Flow prediction in spatio-temporal networks based on multitask deep learning. IEEE Trans. Knowl. Data Eng. 2019, 32, 468–478. [Google Scholar] [CrossRef]
- Bahrampour, S.; Nasrabadi, N.M.; Ray, A.; Jenkins, W.K. Multimodal task-driven dictionary learning for image classification. IEEE Trans. Image Process. 2015, 25, 24–38. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.-K.; Zhang, Q.; Chong, H.-Y.; Wang, X. Integrated supplier selection framework in a resilient construction supply chain: An approach via analytic hierarchy process (AHP) and grey relational analysis (GRA). Sustainability 2017, 9, 289. [Google Scholar] [CrossRef]
- Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient surgery for multi-task learning. Adv. Neural Inf. Pro-cess. Syst. 2020, 33, 5824–5836. [Google Scholar] [CrossRef]
- Chen, Z.; Badrinarayanan, V.; Lee, C.Y.; Rabinovich, A. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. arXiv 2018, arXiv:1711.02257. [Google Scholar] [CrossRef]
- Wang, Z.; Tsvetkov, Y.; Firat, O.; Cao, Y. Gradient vaccine: Investigating and improving multi-task optimization in mas-sively multilingual models. arXiv 2020, arXiv:2010.05874. [Google Scholar] [CrossRef]
- Tang, Y.; Yang, K.; Zhang, S.; Zhang, Z. Wind power forecasting: A hybrid forecasting model and multi-task learning-based framework. Energy 2023, 278, 127864. [Google Scholar] [CrossRef]
- Liu, R.; Chen, L.; Hu, W.; Huang, Q. Short-term load forecasting based on LSTNet in power system. Int. Trans. Electr. Energy Syst. 2021, 31, e13164. [Google Scholar] [CrossRef]
- Ilias, L.; Kapsalis, P.; Mouzakitis, S.; Askounis, D. A multitask learning framework for predicting ship fuel oil consumption. IEEE Access 2023, 11, 132576–132589. [Google Scholar] [CrossRef]
- Zhang, S.; Guo, B.; Dong, A.; He, J.; Xu, Z.; Chen, S.X. Cautionary tales on air-quality improvement in Beijing. Proc. R. Soc. A Math. Phys. Eng. Sci. 2017, 473, 20170457. [Google Scholar] [CrossRef]
Method | Task | Training RMSE | Training MAE | Testing RMSE | Testing MAE | ||||
---|---|---|---|---|---|---|---|---|---|
Mean | Dev. | Mean | Dev. | Mean | Dev. | Mean | Dev. | ||
STL-FNN | WT1 | 17.043 | 1.864 | 13.549 | 1.513 | 11.959 | 0.612 | 8.978 | 0.379 |
WT2 | 11.457 | 0.541 | 9.264 | 0.370 | 10.340 | 0.297 | 8.780 | 0.303 | |
MTL-JGDA (fixed learning rate) | WT1 | 11.008 | 0.197 | 7.830 | 0.053 | 7.859 | 0.124 | 5.624 | 0.139 |
WT2 | 9.828 | 0.376 | 7.000 | 0.459 | 6.757 | 0.364 | 5.082 | 0.133 | |
MTL-JGDA (adaptive learning rate) | WT1 | 11.157 | 0.273 | 7.811 | 0.263 | 7.246 | 0.123 | 5.034 | 0.092 |
WT2 | 9.176 | 0.903 | 6.182 | 0.739 | 6.753 | 0.182 | 4.967 | 0.083 | |
MTL-PCGrad | WT1 | 14.449 | 0.590 | 10.509 | 0.629 | 9.361 | 0.311 | 7.092 | 0.202 |
WT2 | 14.124 | 1.540 | 10.327 | 1.239 | 9.679 | 0.250 | 7.938 | 0.212 | |
MTL-Gradvac | WT1 | 12.802 | 0.296 | 8.783 | 0.345 | 9.199 | 0.155 | 6.830 | 0.046 |
WT2 | 11.409 | 0.375 | 7.974 | 0.414 | 9.194 | 0.143 | 7.303 | 0.172 | |
MTL-GradNorm | WT1 | 12.714 | 0.870 | 9.783 | 0.808 | 9.646 | 0.038 | 7.140 | 0.040 |
WT2 | 10.633 | 0.428 | 8.359 | 0.385 | 8.191 | 0.414 | 6.881 | 0.482 |
Method | Task | Training RMSE | Training MAE | Testing RMSE | Testing MAE | ||||
---|---|---|---|---|---|---|---|---|---|
Mean | Dev. | Mean | Dev. | Mean | Dev. | Mean | Dev. | ||
STL-FNN | Main | 9.768 | 2.013 | 5.790 | 1.622 | 4.323 | 0.554 | 3.392 | 0.227 |
Auxiliary | 31.839 | 4.942 | 15.117 | 4.420 | 11.181 | 0.942 | 8.809 | 0.719 | |
MTL-JGDA (fixed learning rate) | Main | 4.919 | 0.893 | 3.811 | 0.857 | 2.127 | 0.338 | 1.717 | 0.248 |
Auxiliary | 20.916 | 7.741 | 7.695 | 6.652 | 8.145 | 2.305 | 5.988 | 1.081 | |
MTL-JGDA (adaptive learning rate) | Main | 3.690 | 2.196 | 2.702 | 1.600 | 1.734 | 0.364 | 1.334 | 0.235 |
Auxiliary | 22.653 | 0.691 | 9.744 | 1.457 | 5.956 | 0.695 | 3.915 | 0.283 | |
MTL-PCGrad | Main | 9.002 | 0.887 | 5.330 | 0.953 | 3.078 | 0.456 | 2.600 | 0.164 |
Auxiliary | 28.612 | 0.483 | 11.838 | 0.719 | 19.209 | 0.229 | 10.954 | 0.830 | |
MTL-Gradvac | Main | 8.169 | 0.382 | 3.815 | 0.666 | 3.361 | 0.455 | 2.374 | 0.131 |
Auxiliary | 29.064 | 1.503 | 11.599 | 2.936 | 19.373 | 1.263 | 13.343 | 2.839 | |
MTL-GradNorm | Main | 9.085 | 2.939 | 6.418 | 2.088 | 3.668 | 0.521 | 2.652 | 0.423 |
Auxiliary | 40.075 | 3.306 | 20.961 | 3.828 | 13.103 | 0.428 | 10.173 | 1.058 |
Method | Task | Training RMSE | Training MAE | Testing RMSE | Testing MAE | ||||
---|---|---|---|---|---|---|---|---|---|
Mean | Dev. | Mean | Dev. | Mean | Dev. | Mean | Dev. | ||
STL-FNN | Tiantan | 30.275 | 0.910 | 20.950 | 0.262 | 22.838 | 0.973 | 16.832 | 0.841 |
Huairou | 28.231 | 0.337 | 19.256 | 0.231 | 19.791 | 0.494 | 14.602 | 0.180 | |
Changping | 27.452 | 0.543 | 17.589 | 0.590 | 16.510 | 0.260 | 11.481 | 0.118 | |
MTL-JGDA (fixed learning rate) | Tiantan | 30.654 | 0.813 | 21.133 | 0.672 | 20.629 | 1.635 | 15.136 | 1.045 |
Huairou | 28.739 | 0.596 | 19.761 | 0.412 | 19.689 | 1.215 | 14.335 | 0.795 | |
Changping | 26.927 | 0.648 | 17.319 | 0.573 | 17.072 | 0.597 | 11.234 | 0.201 | |
MTL-JGDA (adaptive learning rate) | Tiantan | 30.709 | 0.453 | 21.570 | 0.628 | 20.571 | 1.133 | 15.269 | 0.480 |
Huairou | 28.151 | 0.163 | 19.405 | 0.259 | 17.827 | 0.581 | 13.368 | 0.253 | |
Changping | 28.112 | 0.825 | 18.093 | 0.612 | 16.852 | 0.296 | 10.614 | 0.099 | |
MTL-PCGrad | Tiantan | 31.887 | 1.770 | 22.460 | 1.984 | 22.953 | 1.114 | 17.004 | 0.845 |
Huairou | 29.044 | 1.358 | 20.585 | 1.662 | 18.639 | 0.561 | 13.740 | 0.589 | |
Changping | 30.843 | 2.017 | 21.210 | 2.312 | 18.135 | 0.589 | 11.696 | 0.164 | |
MTL-Gradvac | Tiantan | 32.084 | 0.806 | 22.331 | 0.630 | 19.712 | 0.567 | 14.386 | 0.765 |
Huairou | 30.102 | 0.872 | 21.106 | 0.8045 | 21.320 | 1.222 | 15.637 | 0.841 | |
Changping | 31.898 | 0.844 | 21.277 | 0.632 | 17.090 | 0.409 | 11.215 | 0.240 | |
MTL-GradNorm | Tiantan | 32.953 | 2.916 | 23.129 | 2.020 | 23.313 | 1.320 | 17.364 | 1.170 |
Huairou | 28.897 | 0.071 | 19.951 | 0.020 | 19.599 | 1.261 | 14.956 | 0.928 | |
Changping | 27.148 | 0.484 | 18.163 | 0.943 | 16.924 | 1.012 | 11.139 | 0.586 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, X.; Zhao, Y.; Yang, Y. Design of Multi-Task Parallel Model Based on Fuzzy Neural Networks and Joint Gradient Descent Algorithm. Appl. Sci. 2025, 15, 10386. https://doi.org/10.3390/app151910386
Wu X, Zhao Y, Yang Y. Design of Multi-Task Parallel Model Based on Fuzzy Neural Networks and Joint Gradient Descent Algorithm. Applied Sciences. 2025; 15(19):10386. https://doi.org/10.3390/app151910386
Chicago/Turabian StyleWu, Xiaolong, Yan Zhao, and Yanxia Yang. 2025. "Design of Multi-Task Parallel Model Based on Fuzzy Neural Networks and Joint Gradient Descent Algorithm" Applied Sciences 15, no. 19: 10386. https://doi.org/10.3390/app151910386
APA StyleWu, X., Zhao, Y., & Yang, Y. (2025). Design of Multi-Task Parallel Model Based on Fuzzy Neural Networks and Joint Gradient Descent Algorithm. Applied Sciences, 15(19), 10386. https://doi.org/10.3390/app151910386