Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images †
Abstract
1. Introduction
- We propose three semantic codebook generation methods: (i) semantic quantization, which forms a fixed codebook from source realizations; (ii) semantic compression, which builds on quantization by exploiting semantic redundancy; and (iii) semantic vector-quantized autoencoder (VQ-AE), which trains a codebook.
- For semantic quantization and compression, we design a neural network-based source code that minimizes end-to-end semantic distortion.
- We introduce a novel performance metric called system time efficiency to account for pre-communication overhead in machine-learning aided wireless communication systems. It quantifies task completion given a time budget by jointly accounting for training and transmission phases, enabling fair comparison and penalizing models with excessive training overhead. Simulations on multiple datasets demonstrate that semantic quantization and compression improve time efficiency over wireless channels, while reducing training cost and enhancing resilience under data scarcity compared to the learning-based semantic VQ-AE.
2. Preliminaries
3. System Model
4. Proposed Methods
4.1. Semantic Quantization
Algorithm 1 Triplet Generation for Triplet Loss in Each Batch |
Algorithm 2 Training Codeword Assignment Model |
4.2. Semantic Compression
Algorithm 3 Codeword Assignment for Codebook Indices |
4.3. Semantic Vector-Quantized Autoencoder
Algorithm 4 Index Perturbation for Adversarial Training |
Algorithm 5 Training Semantic VQ AE Model |
4.4. Performance Metrics
5. Results
5.1. Datasets and Simulation Settings
Dataset | Method | Number of Bits | Acc. (%) |
---|---|---|---|
AG’s News | Conventional | 1,480,599 | |
Sem. Quan. | 26,000 | ||
Sem. Comp. 501 * | 18,000 | ||
Sem. VQ AE 501 + | 18,000 | ||
DBPedia 14 | Conventional | 2,160,054 | |
Sem. Quan. | 24,000 | ||
Sem. Comp. 252 * | 16,000 | ||
Sem. VQ AE 252 + | 16,000 | ||
CIFAR 10 | Conventional | 4,073,328 | |
Sem. Quan. | 24,000 | ||
Sem. Comp. 147 * | 16,000 | ||
Sem. VQ AE 147 + | 16,000 | ||
STL 10 | Conventional | 22188432 | |
Sem. Quan. | 22,000 | ||
Sem. Comp. 88 * | 14,000 | ||
Sem. VQ AE 88 + | 14,000 | ||
Multi Modal | Conventional | 23,659,964 | |
Sem. Quan. | 52,000 | ||
Sem. Comp. 208 * | 32,000 | ||
Sem. VQ AE 208 + | 32,000 |
5.2. Source Compression
5.3. Wireless Channel Simulations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jameel, F.; Chang, Z.; Huang, J.; Ristaniemi, T. Internet of Autonomous Vehicles: Architecture, Features, and Socio-Technological Challenges. IEEE Wirel. Commun. 2019, 26, 21–29. [Google Scholar] [CrossRef]
- Atzori, L.; Iera, A.; Morabito, G. The Internet of Things: A Survey. Comput. Netw. 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Gündüz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.B. Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications. IEEE J. Sel. Areas Commun. 2023, 41, 5–41. [Google Scholar] [CrossRef]
- Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Champaign, IL, USA, 1949. [Google Scholar]
- Carnap, R.; Bar-Hillel, Y. An Outline of a Theory of Semantic Communication; RLE Technical 522 Reports; Research Laboratory of Electronics, Massachusetts Institute of Technology: Cambridge, MA, USA, 1952; Volume 247. [Google Scholar]
- Melamed, I.D. Measuring Semantic Entropy. In Proceedings of the Tagging Text with Lexical Semantics: Why, What, and How? Washington, DC, USA, 4–5 April 1997. [Google Scholar]
- Juba, B.; Sudan, M. Universal semantic communication I. In Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 17–20 May 2008; STOC ’08. pp. 123–132. [Google Scholar] [CrossRef]
- Guler, B.; Yener, A. Semantic index assignment. In Proceedings of the IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom), Budapest, Hungary, 24–28 March 2014; pp. 431–436. [Google Scholar]
- Güler, B.; Yener, A.; Swami, A. The Semantic Communication Game. IEEE Trans. Cogn. Commun. Netw. 2018, 4, 787–802. [Google Scholar] [CrossRef]
- O’Shea, T.; Hoydis, J. An Introduction to Deep Learning for the Physical Layer. IEEE Trans. Cogn. Commun. Netw. 2017, 3, 563–575. [Google Scholar] [CrossRef]
- Ma, S.; Qiao, W.; Wu, Y.; Li, H.; Shi, G.; Gao, D.; Shi, Y.; Li, S.; Al-Dhahir, N. Task-oriented Explainable Semantic Communications. IEEE Trans. Wirel. Commun. 2023, 22, 9248–9262. [Google Scholar] [CrossRef]
- Sheng, Y.; Li, F.; Liang, L.; Jin, S. A Multi-Task Semantic Communication System for Natural Language Processing. In Proceedings of the 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall), London, UK, 26–29 September 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Farsad, N.; Rao, M.; Goldsmith, A. Deep Learning for Joint Source-Channel Coding of Text. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2326–2330. [Google Scholar] [CrossRef]
- Jankowski, M.; Gündüz, D.; Mikolajczyk, K. Wireless Image Retrieval at the Edge. IEEE J. Sel. Areas Commun. 2021, 39, 89–100. [Google Scholar] [CrossRef]
- Weng, Z.; Qin, Z. Semantic Communication Systems for Speech Transmission. IEEE J. Sel. Areas Commun. 2021, 39, 2434–2444. [Google Scholar] [CrossRef]
- Xie, H.; Qin, Z.; Tao, X.; Letaief, K.B. Task-Oriented Multi-User Semantic Communications. IEEE J. Sel. Areas Commun. 2022, 40, 2584–2597. [Google Scholar] [CrossRef]
- Sagduyu, Y.E.; Erpek, T.; Ulukus, S.; Yener, A. Is Semantic Communication Secure? A Tale of Multi-Domain Adversarial Attacks. IEEE Commun. Mag. 2023, 61, 50–55. [Google Scholar] [CrossRef]
- Kaplan, J.; McCandlish, S.; Henighan, T.; Brown, T.B.; Chess, B.; Child, R.; Gray, S.; Radford, A.; Wu, J.; Amodei, D. Scaling Laws for Neural Language Models. arXiv 2020. [Google Scholar] [CrossRef]
- Kutay, E.; Yener, A. Semantic Text Compression for Classification. In Proceedings of the 2023 IEEE International Conference on Communications Workshops (ICC Workshops), Rome, Italy, 28 May–1 June 2023; pp. 1368–1373. [Google Scholar] [CrossRef]
- Kutay, E.; Yener, A. Classification-Oriented Semantic Wireless Communications. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 9096–9100. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, AZ, USA, 2–4 May 2013. Workshop Track Proceedings. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in NLP, Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; Volume 139, pp. 8748–8763. [Google Scholar]
- Arora, S.; May, A.; Zhang, J.; Ré, C. Contextual Embeddings: When Are They Worth It? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; pp. 2650–2663. [Google Scholar] [CrossRef]
- Grollmisch, S.; Cano, E.; Kehling, C.; Taenzer, M. Analyzing the Potential of Pre-Trained Embeddings for Audio Classification Tasks. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 18–22 January 2021; pp. 790–794. [Google Scholar] [CrossRef]
- Frey, B.J.; Dueck, D. Clustering by Passing Messages Between Data Points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef] [PubMed]
- Daras, G.; Dean, J.; Jalal, A.; Dimakis, A. Intermediate Layer Optimization for Inverse Problems using Deep Generative Models. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; Volume 139, pp. 2421–2432. [Google Scholar]
- Bora, A.; Jalal, A.; Price, E.; Dimakis, A.G. Compressed Sensing using Generative Models. In Proceedings of the 34th International Conference on Machine Learning, PMLR, Sydney, NSW, Australia, 6–11 August 2017; Volume 70, pp. 537–546. [Google Scholar]
- Burkard, R.E.; Derigs, U. The Linear Sum Assignment Problem. In Assignment and Matching Problems: Solution Methods with FORTRAN-Programs; Springer: Berlin/Heidelberg, Germany, 1980; pp. 1–15. [Google Scholar] [CrossRef]
- Crouse, D.F. On implementing 2D rectangular assignment algorithms. IEEE Trans. Aerosp. Electron. Syst. 2016, 52, 1679–1696. [Google Scholar] [CrossRef]
- Liu, X.; Chen, Q.; Wu, X.; Hua, Y.; Chen, J.; Li, D.; Tang, B.; Wang, X. Gated Semantic Difference Based Sentence Semantic Equivalence Identification. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2770–2780. [Google Scholar] [CrossRef]
- van den Oord, A.; Vinyals, O.; Kavukcuoglu, K. Neural Discrete Representation Learning. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
- Hu, Q.; Zhang, G.; Qin, Z.; Cai, Y.; Yu, G.; Li, G.Y. Robust Semantic Communications with Masked VQ-VAE Enabled Codebook. IEEE Trans. Wirel. Commun. 2023, 22, 8707–8722. [Google Scholar] [CrossRef]
- Zhang, X.; Zhao, J.; LeCun, Y. Character-level Convolutional Networks for Text Classification. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2015; Volume 28. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report 0; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Coates, A.; Ng, A.; Lee, H. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, PMLR, Ft. Lauderdale, FL, USA, 11–13 April 2011; Volume 15, pp. 215–223. [Google Scholar]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2018. [Google Scholar] [CrossRef]
Dataset | Modality | Classes | Train Size |
---|---|---|---|
AG’s News | Text | 4 | 120,000 |
DBPedia 14 | Text | 14 | 560,000 |
CIFAR 10 | Image | 10 | 50,000 |
STL 10 * | Image | 10 | 11,000 |
Parameter | AG’s News | DBPedia 14 | CIFAR 10 | STL 10 | |
---|---|---|---|---|---|
n | 5000 | 3000 | 1500 | ||
Class. Block | Batch Size | 128 | |||
15 | |||||
LR Sched. | Exponential LR | ||||
0.75 | |||||
Initial LR | 0.001 | ||||
Optimizer | |||||
Codeword NN | Batch Size | 128 | |||
Semantic Quantization: 6 Semantic Compression: 10 | |||||
LR Sched. | Exponential LR | ||||
0.97 | |||||
Initial LR | 0.001 | ||||
Optimizer | |||||
1 | |||||
(0.25, 0.5, 0.25) | |||||
Semantic VQ AE | k | 64 | |||
4 | |||||
Batch Size | 128 | ||||
25 | |||||
LR Sched. | Exponential LR | ||||
0.9 | 0.98 | ||||
Initial LR | 0.005 | ||||
Optimizer | |||||
0.0001 | 0.00025 | 0.0005 | |||
SNRmin | 0 (dB) | ||||
SNRmax | 21 (dB) |
Method | Corresponding Message |
---|---|
s | Wiltshire Police warns about “phishing” after its fraud squad chief was targeted. |
Do-it-yourself phishing kits are freely available on the Internet, a security firm said Thursday, and they will lead to more scams sent to online consumers. | |
Do-it-yourself phishing kits are being made available for download free of charge from the interest, security watchers have warned. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kutay, E.; Yener, A. Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images. Entropy 2025, 27, 813. https://doi.org/10.3390/e27080813
Kutay E, Yener A. Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images. Entropy. 2025; 27(8):813. https://doi.org/10.3390/e27080813
Chicago/Turabian StyleKutay, Emrecan, and Aylin Yener. 2025. "Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images" Entropy 27, no. 8: 813. https://doi.org/10.3390/e27080813
APA StyleKutay, E., & Yener, A. (2025). Harnessing the Power of Pre-Trained Models for Efficient Semantic Communication of Text and Images. Entropy, 27(8), 813. https://doi.org/10.3390/e27080813