Natural Language Processing in Generating Industrial Documentation Within Industry 4.0/5.0
Abstract
1. Introduction
1.1. Observed Gaps
1.2. Novelty and Contribution
1.3. Goal and Research Questions
- RQ1: How are deep learning-based NLP methods currently used to generate, manage, and update industrial documentation?
- RQ2: What are the existing research gaps, technological barriers, and untapped opportunities in this interdisciplinary field?
- RQ3: What deep learning methods are currently used in NLP for industrial documentation generation, and how are they being implemented?
- RQ4: To what extent do existing solutions meet requirements such as factual accuracy, regulatory compliance, explainability, and real-time responsiveness?
2. Materials and Methods
2.1. Dataset
2.2. Methods
2.3. Data Selection
3. Results
3.1. Traditional Industrial Documents Generation
3.2. Documents Generation Within Industry 4.0
3.3. Documents Generation Within Industry 5.0
3.4. Documents Generation Within Industry 6.0
3.5. Cybersecurity of Automatically Generated Industrial Documents
- Threat vectors:
- Injecting malicious data into training or high-speed input data;
- Unauthorized access to generated content (IP theft, industrial espionage);
- Misleading or falsified generated documentation.
- Attack surfaces:
- Immutable storage of generated documents;
- Tamper-proof version control;
- Audit trail for document revisions;
- Smart contracts for securely running document workflows.
- IPFS for decentralized storage;
- Solutions for Industrial Smart Contracts.
- Generated document hash stored on-chain;
- Metadata (author, timestamp, revision ID) stored in a ledger;
- Verification mechanisms via smart contracts or zero-knowledge proofs;
- Federated/decentralized AI inference for enhanced security [61].
4. Discussion
- RQ1: How are deep learning-based NLP methods currently used to generate, manage, and update industrial documentation? Results, Section 3.2 and Section 3.3.
- RQ2: What are the existing research gaps, technological barriers, and untapped opportunities in this interdisciplinary field? Results, Section 3.4, Discussion, Section 4.1 and Section 4.6.
- RQ3: What deep learning methods are currently used in NLP for industrial documentation generation, and how are they being implemented? Discussion, Section 4.1 and Section 4.2.
- RQ4: To what extent do existing solutions meet requirements such as factual accuracy, regulatory compliance, explainability, and real-time responsiveness? Discussion, Section 4.5.
4.1. Limitations
4.2. Technological Implications
4.3. Economic Implications
4.4. Social Implications
4.5. Legal and Ethical Implications
4.6. Directions for Further Research
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial intelligence |
| DL | Deep learning |
| GDPR | General Data Protection Regulation |
| IoT | Internet of Things |
| ML | Machine learning |
| NLP | Natural language processing |
| SDG | Sustainable development goal |
| SME | Small and medium enterprise |
| XAI | eXplainable artificial intelligence |
References
- Rojek, I.; Dostatni, E.; Mikołajewski, D.; Pawłowski, L.; Wegrzyn-Wolska, K. Modern approach to sustainable production in the context of Industry 4.0. Bull. Pol. Acad. Sci. Tech. Sci. 2022, 70, e143828. [Google Scholar] [CrossRef]
- Corrêa Cordeiro, F.; Ferreirada Silva, P.; Tessarollo, A.; Freitas, C.; de Souza, E.; da Silva Magalhães Gomes, D.; Rocha Souza, R.; Codeço Coelho, F. Petro NLP: Resources for natural language processing and information extraction for the oil and gas industry. Comput. Geosci. 2024, 193, 105714. [Google Scholar] [CrossRef]
- He, S.; Lu, Y. A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning. Electronics 2019, 8, 1417. [Google Scholar] [CrossRef]
- Kim, H.-S.; Kang, J.-W.; Choi, S.-Y. ChatGPT vs. Human Journalists: Analyzing News Summaries Through BERT Score and Moderation Standards. Electronics 2025, 14, 2115. [Google Scholar] [CrossRef]
- Mikołajewska, E.; Masiak, J. Deep Learning Approaches to Natural Language Processing for Digital Twins of Patients in Psychiatry and Neurological Rehabilitation. Electronics 2025, 14, 2024. [Google Scholar] [CrossRef]
- Kayabas, A.; Topcu, A.E.; Alzoubi, Y.I.; Yıldız, M. A Deep Learning Approach to Classify AI-Generated and Human-Written Texts. Appl. Sci. 2025, 15, 5541. [Google Scholar] [CrossRef]
- Pan, K.; Zhang, X.; Chen, L. Research on the Training and Application Methods of a Lightweight Agricultural Domain-Specific Large Language Model Supporting Mandarin Chinese and Uyghur. Appl. Sci. 2024, 14, 5764. [Google Scholar] [CrossRef]
- Orji, E.Z.; Haydar, A.; Erşan, İ.; Mwambe, O.O. Advancing OCR Accuracy in Image-to-LaTeX Conversion—A Critical and Creative Exploration. Appl. Sci. 2023, 13, 12503. [Google Scholar] [CrossRef]
- Feichter, C.; Schlippe, T. Investigating Models for the Transcription of Mathematical Formulas in Images. Appl. Sci. 2024, 14, 1140. [Google Scholar] [CrossRef]
- Lamaakal, I.; Maleh, Y.; El Makkaoui, K.; Ouahbi, I.; Pławiak, P.; Alfarraj, O.; Almousa, M.; Abd El-Latif, A.A. Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions. Sensors 2025, 25, 1318. [Google Scholar] [CrossRef] [PubMed]
- Al-Safi, H.; Ibrahim, H.; Steenson, P. Vega: LLM-Driven Intelligent Chatbot Platform for Internet of Things Control and Development. Sensors 2025, 25, 3809. [Google Scholar] [CrossRef]
- Javed, S.; Usman, M.; Sandin, F.; Liwicki, M.; Mokayed, H. Deep Ontology Alignment Using a Natural Language Processing Approach for Automatic M2M Translation in IIoT. Sensors 2023, 23, 8427. [Google Scholar] [CrossRef]
- Wang, J.; Tang, Y.; He, S.; Zhao, C.; Sharma, P.K.; Alfarraj, O.; Tolba, A. Log Event2vec: Log Event-to-Vector Based Anomaly Detection for Large-Scale Logs in Internet of Things. Sensors 2020, 20, 2451. [Google Scholar] [CrossRef]
- Rojek, I. Hybrid Neural Networks as Prediction Models. In Artificial Intelligence and Soft Computing, Lecture Notes in Artificial Intelligence; Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 88–95. [Google Scholar]
- Rojek, I. Models for Better Environmental Intelligent Management within Water Supply Systems. Water Resour. Manag. 2014, 28, 3875–3890. [Google Scholar] [CrossRef]
- Vaiyapuri, T.; Jagannathan, S.K.; Ahmed, M.A.; Ramya, K.C.; Joshi, G.P.; Lee, S.; Lee, G. Sustainable Artificial Intelligence-Based Twitter Sentiment Analysison COVID-19 Pandemic. Sustainability 2023, 15, 6404. [Google Scholar] [CrossRef]
- Cortés-Caicedo, B.; Grisales-Noreña, L.F.; Montoya, O.D.; Rodriguez-Cabal, M.A.; Rosero, J.A. Energy Management System for the Optimal Operation of PV Generators in Distribution Systems Using the Antlion Optimizer: A Colombian Urban and Rural Case Study. Sustainability 2022, 14, 16083. [Google Scholar] [CrossRef]
- Jamii, J.; Trabelsi, M.; Mansouri, M.; Mimouni, M.F.; Shatanawi, W. Non-Linear Programming-Based Energy Management for a Wind Farm Coupled with Pumped Hydro Storage System. Sustainability 2022, 14, 11287. [Google Scholar] [CrossRef]
- Choi, S.-W.; Lee, E.-B.; Kim, J.-H. The Engineering Machine-Learning Automation Platform (EMAP): A Big-Data-Driven AI Tool for Contractors’ Sustainable Management Solutions for Plant Projects. Sustainability 2021, 13, 10384. [Google Scholar] [CrossRef]
- Alqahtani, E.; Janbi, N.; Sharaf, S.; Mehmood, R. Smart Homes and Families to Enable Sustainable Societies: A Data-Driven Approach for Multi-Perspective Parameter Discovery Using BERT Modelling. Sustainability 2022, 14, 13534. [Google Scholar] [CrossRef]
- Ali, S.; Shirazi, F. A Transformer-Based Machine Learning Approach for Sustainable E-Waste Management: A Comparative Policy Analysis between the Swiss and Canadian Systems. Sustainability 2022, 14, 13220. [Google Scholar] [CrossRef]
- Nam, S.; Yoon, S.; Raghavan, N.; Park, H. Identifying Service Opportunities Based on Outcome-Driven Innovation Framework and Deep Learning: A Case Study of Hotel Service. Sustainability 2021, 13, 391. [Google Scholar] [CrossRef]
- Rojek, I.; Kowal, M.; Stoic, A. Predictive compensation of thermal deformations of ball screws in cnc machines using neural networks. Teh. Vjesn.-Tech. Gaz. 2017, 24, 1697–1703. [Google Scholar]
- Rojek, I.; Mikołajewski, D.; Kotlarz, P.; Tyburek, K.; Kopowski, J.; Dostatni, E. Traditional Artificial Neural Networks Versus Deep Learning in Optimization of Material Aspects of 3D Printing. Materials 2021, 14, 7625. [Google Scholar] [CrossRef] [PubMed]
- Bäckstrand, E.; Djupedal, R.; Öberg, L.M.; de Oliveira Neto, F.G. Unveiling Disparities: NLP Analysis of Software Industry and Vocational Education Gaps. In Proceedings of the Third ACM/IEEE International Workshop on NL-Based Software Engineering, Lisbon, Portugal, 20 April 2024; pp. 9–16. [Google Scholar]
- Lindh-Knuutila, T.; Loftsson, H.; Doval, P.A.; Andersson, S.; Barkarson, B.; Cerezo-Costas, H.; Guðnason, J.; Gylfason, J.; Hemminki, J.; Kaalep, H.J. Microservices at Your Service: Bridging the Gap between NLP Research and Industry. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), Tórshavn, Faroe Islands, 22–24 May 2023; pp. 86–91. [Google Scholar]
- RojasGonzález, T.; Rocha, M.A.; Baltazar, R.; Casillas, M.A.; del Valle, J. Voice Control System Through NLP (Natural Language Processing) as an Interactive Model for Scalable ERP Platforms in Industry 4.0. In Proceedings of the Agents and Multi-Agent Systems: Technologies and Applications 2022: Proceedings of 16th KES International Conference, KES-AMSTA, Rhodes, Greece, 20–22 June 2022; pp. 207–217. [Google Scholar]
- Mantzaris, A.V. Benchmark Data NLP.jl: Synthetic Data Generation for NLP Benchmarking. J. Open Source Softw. 2025, 10, 7844. [Google Scholar] [CrossRef]
- Papadimas, C.; Ragazou, V.; Karasavvidis, I.; Kollias, V. Predicting learning performance using NLP: An exploratory study using two semantic textual similarity methods. Knowl. Inf. Syst. 2025, 67, 4567–4595. [Google Scholar] [CrossRef]
- Bourdin, M.; Paviot, T.; Pellerin, R.; Lamouri, S. NLP in SMEs for industry 4.0: Opportunities and challenges. Procedia Comput. Sci. 2024, 239, 396–403. [Google Scholar] [CrossRef]
- Lee, H.; Hyun Kim, J.; Sun Jung, H. ESG-KIBERT: A new paradigm in ESG evaluation using NLP and industry-specific customization. Decis. Support Syst. 2025, 193, 114440. [Google Scholar] [CrossRef]
- Czeczot, G.; Rojek, I.; Mikołajewski, D.; Sangho, B. AI in IIoT Management of Cybersecurity for Industry 4.0 and Industry 5.0 Purposes. Electronics 2023, 12, 3800. [Google Scholar] [CrossRef]
- Rojek, I.; Mroziński, A.; Kotlarz, P.; Macko, M.; Mikołajewski, D. AI-Based Computational Model in Sustainable Transformation of Energy Markets. Energies 2023, 16, 8059. [Google Scholar] [CrossRef]
- Alrashidi, B.; Jamal, A.; Alkhathlan, A. Abusive Content Detection in Arabic Tweets Using Multi-Task Learning and Transformer-Based Models. Appl. Sci. 2023, 13, 5825. [Google Scholar] [CrossRef]
- Al Duhayyim, M.; Alazwari, S.; Mengash, H.A.; Marzouk, R.; Alzahrani, J.S.; Mahgoub, H.; Althukair, F.; Salama, A.S. Metaheuristics Optimization with Deep Learning Enabled Automated Image Captioning System. Appl. Sci. 2022, 12, 7724. [Google Scholar] [CrossRef]
- Kumar, G.; Basri, S.; Imam, A.A.; Khowaja, S.A.; Capretz, L.F.; Balogun, A.O. Data Harmonization for Heterogeneous Datasets: A Systematic Literature Review. Appl. Sci. 2021, 11, 8275. [Google Scholar] [CrossRef]
- Avci, C.; Tekinerdogan, B.; Athanasiadis, I.N. Software architectures for big data: A systematic literature review. Big Data Anal. 2020, 5, 5. [Google Scholar] [CrossRef]
- Maheshwari, H.; Verma, L.; Chandra, U. Overview of Big Data and Its Issues. Int. J. Res. Electron. Comput. Eng. 2019, 7, 256. [Google Scholar]
- Younan, M.; Houssein, E.H.; Elhoseny, M.; Ali, A.A. Challenges and recommended technologies for the industrial internet of things: A comprehensive review. Measurement 2020, 151, 107198. [Google Scholar] [CrossRef]
- Wang, Y.; Jan, M.N.; Chu, S.; Zhu, Y. Use of Big Data Tools and Industrial Internet of Things: An Overview. Sci. Program. 2020, 2020, 8810634. [Google Scholar] [CrossRef]
- Ralph, B.; Stockinger, M. Digitalization and digital transformation in metal forming: Key technologies, challenges and current developments of industry 4.0 applications. In Proceedings of the XXXIX, Colloquium on Metal Forming, Leoben, Austria, 21–25 March 2020. [Google Scholar]
- Kourou, K.D.; Pezoulas, V.C.; Georga, E.I.; Exarchos, T.P.; Tsanakas, P.; Tsiknakis, M.; Varvarigou, T.; De Vita, S.; Tzioufas, A.; Fotiadis, D.I.I. Cohort Harmonization and Integrative Analysis from a Biomedical Engineering Perspective. IEEE Rev. Biomed. Eng. 2018, 12, 303–318. [Google Scholar] [CrossRef] [PubMed]
- Stoyanova, M.; Nikoloudakis, Y.; Panagiotakis, S.; Pallis, E.; Markakis, E.K. A Survey on the Internet of Things (IoT) Forensics: Challenges, Approaches, and Open Issues. IEEE Commun. Surv. Tutor. 2020, 22, 1191–1221. [Google Scholar] [CrossRef]
- Sahu, A.K.; Sahu, A.K.; Sahu, N.K. A Review on the Research Growth of Industry 4.0: IIoT Business Architectures Benchmarking. Int. J. Bus. Anal. 2020, 7, 77–97. [Google Scholar] [CrossRef]
- Khan, M.; Wu, X.; Xu, X.; Dou, W. Big data challenges and opportunities in the hype of Industry 4.0. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
- Sajid, S.; Haleem, A.; Bahl, S.; Javaid, M.; Goyal, T.; Mittal, M. Data science applications for predictive maintenance and materials science in context to Industry 4.0. Mater. Today Proc. 2021, 45, 4898–4905. [Google Scholar] [CrossRef]
- Jagtap, S.; Bader, F.; Garcia-Garcia, G.; Trollman, H.; Fadiji, T.; Salonitis, K. Food Logistics 4.0: Opportunities and Challenges. Logistics 2020, 5, 2. [Google Scholar] [CrossRef]
- De Vass, T.; Shee, H.; Miah, S. IoT in Supply Chain Management: Opportunities and Challenges for Businesses in Early Industry 4.0 Context. Oper. Supply Chain Manag. Int. J. 2021, 14, 148–161. [Google Scholar] [CrossRef]
- Gong, L.; Fast-Berglund, A.; Johansson, B. A Framework for Extended Reality System Development in Manufacturing. IEEE Access 2021, 9, 24796–24813. [Google Scholar] [CrossRef]
- Poria, S.; Cambria, E.; Bajpai, R.; Hussain, A. A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fusion 2017, 37, 98–125. [Google Scholar] [CrossRef]
- Shoumy, N.J.; Ang, L.-M.; Seng, K.P.; Rahaman, D.; Zia, T. Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. J. Netw. Comput. Appl. 2020, 149, 102447. [Google Scholar] [CrossRef]
- Verma, J.P.; Agrawal, S.; Patel, B.; Patel, A. Big data analytics: Challenges and applications for text, audio, video, and social media data. Int. J. Soft Comput. Artif. Intell. Appl. 2016, 5, 41–51. [Google Scholar] [CrossRef]
- Duwairi, R.; Hayajneh, A.; Quwaider, M. A Deep Learning Framework for Automatic Detection of Hate Speech Embeddedin Arabic Tweets. Arab. J. Sci. Eng. 2021, 46, 4001–4014. [Google Scholar] [CrossRef]
- Pontikis, I.; Koutivas, S.; Stafylas, D. Greek Patent Classification Using Deep Learning. In Proceedings of the Novel & Intelligent Digital Systems: The 2nd International Conference on Novel and Intelligent Digital Systems-NiDS, Athens, Greece, 29–30 September 2022; Volume 556, pp. 372–381. [Google Scholar]
- Trappey, A.J.C.; Trappey, C.V.; Wang, J.W.C. Intelligent compilation of patent summaries using machine learning and natural language processing techniques. Adv. Eng. Inform. 2022, 43, 101027. [Google Scholar] [CrossRef]
- Kwon, B.; Kim, J.; Mun, D. Construction of design requirements knowledge base from unstructured design guidelines using natural language processing. Comput. Ind. 2024, 159, 104100. [Google Scholar] [CrossRef]
- Rahman, M.M.; Finin, T. Deep Understanding of a Document’s Structure. In Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications And Technologies, Austin, TX, USA, 5–8 December 2017; pp. 63–73. [Google Scholar]
- Xiong, H.; Wu, Y.; Jin, C.; Kumari, S. Efficient and Privacy-Preserving Authentication Protocol for Heterogeneous Systems in IIoT. IEEE Internet Things J. 2020, 7, 11713–11724. [Google Scholar] [CrossRef]
- James, Y.; Szymanezyk, O. The Challenges of Integrating Industry 4.0 in Cyber Security—A Perspective. Int. J. Inf. Educ. Technol. 2021, 11, 242–247. [Google Scholar] [CrossRef]
- Shao, X.-F.; Liu, W.; Li, Y.; Chaudhry, H.R.; Yue, X.-G. Multistage implementation framework for smart supply chain management under industry 4.0. Technol. Forecast. Soc. Change 2021, 162, 120354. [Google Scholar] [CrossRef]
- Giarelis, N.; Karacapilidis, N. Deep learning and embeddings - based approaches for key phrase extraction: A literature review. Knowl. Inf. Syst. 2024, 66, 6493–6526. [Google Scholar] [CrossRef]
- IEC 61360-1:2017; Standard Data Element Types with Associated Classification Scheme—Part 1: Definitions—Principles and Methods. International Electrotechnical Commission: Geneva, Switzerland, 2017. Available online: https://webstore.iec.ch/en/publication/28560 (accessed on 28 October 2025).
- ISO 15531-1:2004; Industrial Automation Systems and Integration—Industrial Manufacturing Management Data Part 1: General Overview. International Organization for Standardization: Geneva, Switzerland, 2004. Available online: https://www.iso.org/standard/28144.html (accessed on 28 October 2025).
- Wu, C.K.; Li, X.; Yang, Z.L. Natural language processing for smart construction: Current status and future directions. Autom. Constr. 2022, 134, 104059. [Google Scholar] [CrossRef]
- Park, S.; Lee, W.; Lee, J. Learning of indiscriminate distributions of document embeddings for domain adaptation. Intell. Data Anal. 2019, 23, 779–797. [Google Scholar] [CrossRef]
- Mohammadi, H.; Giachanou, A.; Bagheri, A. A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction. Appl. Sci. 2024, 14, 8620. [Google Scholar] [CrossRef]
- Pérez-Landa, G.I.; Loyola-González, O.; Medina-Pérez, M.A. An Explainable Artificial Intelligence Model for Detecting Xenophobic Tweets. Appl. Sci. 2021, 11, 10801. [Google Scholar] [CrossRef]
- Directive (EU) 2022/2555 of the European Parliament and of the Council of 14 December 2022 on Measures for a High Common Level of Cybersecurity Across the Union, Amending Regulation (EU) No 910/2014 and Directive (EU) 2018/1972, and Repealing Directive (EU) 2016/1148 (NIS 2 Directive). Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32022L2555 (accessed on 28 October 2025).
- ISO/IEC 20000-1:2018; Information Technology—Service Management Part 1: Service Management System Requirements. International Organization for Standardization: Geneva, Switzerland, 2018. Available online: https://www.iso.org/standard/70636.html (accessed on 28 October 2025).
- Kim, Y.; Park, K.; Yoo, B. Natural language processing-based approach for automatically coding ship sensor data. Int. J. Nav. Archit. Ocean. Eng. 2024, 16, 100581. [Google Scholar] [CrossRef]
- Pathak, A.R.; Pandey, M.; and Rautaray, S. Empirical evaluation of deep learning models for sentiment analysis. J. Stat. Manag. Syst. 2019, 22, 741–752. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, Q.; Wang, Y.; Wang, S. Social Media Rumor Refuter Feature Analysis and Crowd Identification Basedon XGBoost and NLP. Appl. Sci. 2020, 10, 4711. [Google Scholar] [CrossRef]
- Wu, H.; Zhong, B.; Medjdoub, B.; Xing, X.; Jiao, L. An Ontological Metro Accident Case Retrieval Using CBR and NLP. Appl. Sci. 2020, 10, 5298. [Google Scholar] [CrossRef]
- Mishra, R.K.; Raj, H.; Urolagin, S.; Jothi, J.A.A.; Nawaz, N. Cluster-Based Knowledge Graph and Entity-Relation Representation on Tourism Economical Sentiments. Appl. Sci. 2022, 12, 8105. [Google Scholar] [CrossRef]
- Gulnerman, A.G. Do Spatial Trajectories of Social Media Users Implythe Credibility of the Users’ Tweets During Earthquake Crisis Management? Appl. Sci. 2025, 15, 6897. [Google Scholar] [CrossRef]
- Liu, K.; Wang, P.; Liu, J. Evaluation of Cultural Ecosystem Services in the Urban Parks of Macau from a Cultural Heritage Perspective. Appl. Sci. 2025, 15, 3946. [Google Scholar] [CrossRef]
- Zhou, J.; Zhang, H. Transforming Education in the AI Era: A Technology–Organization–Environment Framework Inquiryin to Public Discourse. Appl. Sci. 2025, 15, 3886. [Google Scholar] [CrossRef]
- Ptaszynski, M.; Dybala, P.; Rzepka, R. Application of Artificial Intelligence Methods in Processing of Emotions, Decisions, and Opinions. Appl. Sci. 2024, 14, 5912. [Google Scholar] [CrossRef]
- Peykani, P.; Ramezanlou, F.; Tanasescu, C.; Ghanidel, S. Large Language Models: A Structured Taxonomy and Review of Challenges, Limitations, Solutions, and Future Directions. Appl. Sci. 2025, 15, 8103. [Google Scholar] [CrossRef]
- Benavides-Astudillo, E.; Fuertes, W.; Sanchez-Gordon, S.; Nuñez-Agurto, D.; Rodríguez-Galán, G. A Phishing-Attack-Detection Model Using Natural Language Processing and Deep Learning. Appl. Sci. 2023, 13, 5275. [Google Scholar] [CrossRef]
- Wani, M.A.; El Affendi, M.; Shakil, K.A. AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing. Computers 2024, 13, 264. [Google Scholar] [CrossRef]
- Eleftheriadis, P.; Perikos, I.; Hatzilygeroudis, I. Evaluating Deep Learning Techniques for Natural Language Inference. Appl. Sci. 2023, 13, 2577. [Google Scholar] [CrossRef]
- Ricketts, J.; Barry, D.; Guo, W.; Pelham, J. A Scoping Literature Review of Natural Language Processing Application to Safety Occurrence Reports. Safety 2023, 9, 22. [Google Scholar] [CrossRef]
- Park, H.A.; Jeon, I.; Shin, S.-H.; Seo, S.Y.; Lee, J.J.; Kim, C.; Park, J.O. Natural Language Processing-Based Deep Learning to Predict the Loss of Consciousness Event Using Emergency Department Text Records. Appl. Sci. 2024, 14, 11399. [Google Scholar] [CrossRef]
- Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]








| Stage Name | Tasks |
|---|---|
| Defining research objectives | Defining goals of the bibliometric analysis |
| Selecting Databases and data collections | Choosing appropriate Dataset(s) and developing research queries according to the study goals |
| Data preprocessing | Cleaning the collected date to remove duplicates and irrelevant records |
| Bibliometric software selection | Choosing suitable bibliometric software tool for analysis |
| Data analysis | Description, Author, Journal, Area/Topics, Institution/Country, etc. |
| Visualization (where possible) | Visualizing the analysis results to present insights |
| Interpretation and discussion | Interpreting findings in the context of the research goals |
| Parameter | Description |
|---|---|
| Inclusion criteria | Articles (original, reviews), books and chapters published since 2018, including conference proceedings, in English |
| Exclusion criteria | Books published before 2018, letters, communication, editorials, conference abstracts without full text, other languages than English |
| Keywords used | artificial intelligence, AI, machine learning, ML, deep learning, DL, digital twin, personalization, adaptation |
| Used field codes (WoS) | “Subject” field (consisting of title, abstract, keyword plus and other keywords) |
| Used field codes (Sopus) | article title, abstract and keywords |
| Used field codes (dblp) | Manually |
| Boolean operators used | Yes, e.g., “digital twin” AND (“AI” OR “ML”) AND adaptation |
| Applied filters | Results refined by publication year, document type (e.g., articles, reviews), and subject area (e.g., industry, engineering). |
| Itaration and validation options | Query run iteratively, refinement based on the results, and validation by ensuring relevant articles appear among the top hits |
| Leverage truncation and wildcards used | Used symbols like * for word variations (e.g., “deep learn*”) and? for alternative spellings (e.g., “anali?ed”) |
| Parameter/Feature | Value |
|---|---|
| Leading types of publication | Conference paper (69.60%), Article (21.70%) |
| Leading areas of science | Computer science artificial intelligence, Computer science information systems |
| Leading countries | India, China |
| Leading scientists | None |
| Leading affiliations | None |
| Leading funders (where information available) | None |
| Sustainable development goals (SDGs) | Industry Innovation and Infrastructure, Good Health and Wellbeing |
| Category | Representative Models/Tools | Strengths | Weaknesses | Quantitative Evaluations | Common Datasets Used |
|---|---|---|---|---|---|
| Rule-Based NLG Systems | GATE, SimpleNLG, custom expert systems | Highly predictable output, high explainability, strong compliance with industry standards | Low scalability, hard to maintain rules, poor adaptation to new formats | Precision/recall of rule firing, manual quality scoring | Proprietary enterprise process descriptions, lack of major public datasets |
| Statistical NLP (pre-neural) | n-gram LMs, SMT for MT | Fast and lightweight, consistent for repetitive documentation | Weak contextual handling, poor performance with technical vocabulary | BLEU, METEOR, Perplexity | Europarl (MT baseline), JRC Acquis, Limited industrial corpora |
| Transformer Models (Generic) | BERT, GPT-2/3, T5, BART | Strong contextual understanding, better long-range coherence than classical models | Needs fine-tuning for domain-specific rigor, possible hallucinations | BLEU, ROUGE, BERTScore, factuality measures (FactCC) | MIMIC-III (medical tech), arXiv/Patent corpora, WikiDocs |
| Domain-Adapted LLMs | FLAN-T5 industrial variants, GPT-4/5 fine-tuned on technical corpora | High accuracy in technical writing, strong reasoning, good metadata-to-text generation | Requires high-quality proprietary data, risk of exposing company IP | Expert Review Scores, task-specific metrics (e.g., assembly-step accuracy) | Enterprise manufacturing manuals, fault logs and maintenance reports |
| RAG | GPT+Vector DB, ElasticSearch+T5, LlamaIndex, LangChain | High factuality, transparent citations, suitable for revision of existing manuals | Requires well-curated knowledge base, latency from retrieval | Factual completeness scores, document alignment metrics | Custom enterprise, digital twin logs, semantic metadata graphs |
| Multimodal NLP for documentation | Vision Transformers + LLMs (PaLI, LLaVA, GPT-4o) | Converts CAD, schematics, sensor data into text, ideal for Industry 4.0/5.0/6.0 scenarios | High computational load, limited industrial image datasets | Text–image alignment metrics, Multimodal BLEU/ROUGE | COCO, OpenImages, CAD/schematic datasets from industry partners |
| Ontology-driven NLP | OWL/RDF + LLMs, industrial taxonomy-driven generation | Guaranteed terminology consistency, strong alignment with IEC/ISO standards | Requires heavy knowledge-engineering, slow updates | Ontology-consistency checks, terminology adherence rate | Industry ontologies: IEC 61360 [62], ISO 15531 [63] |
| Priority Area | Description |
|---|---|
| Short-term priorities (1–2 years) | |
| Model accuracy and reliability | Enhance precision of NLP outputs for safety-critical and engineering documentation |
| Explainable NLP systems | Develop interpretable models to improve trust, auditing, and compliance |
| Cross-industry benchmarking | Establish shared datasets, ontologies, and performance indicators for industrial documentation automation |
| Medium-term priorities (3–5 years) | |
| Multimodal documentation generation | Combine text, sensor data, voice, and visual streams to produce context-aware industrial reports. |
| Human-centric NLP for Industry 5.0 | Create adaptive, personalized, and multilingual documentation interfaces for diverse operator skill levels. |
| Collaborative human–AI documentation workflows | Enable mixed-initiative systems where operators and NLP models co-author reports. |
| Edge-enabled real-time NLP | Deploy lightweight NLP models that operate with low latency on industrial edge devices while preserving privacy. |
| Long-term priorities (over 5 years) | |
| Autonomous documentation ecosystems (Industry 6.0) | Explore self-evolving NLP systems that update, validate, and optimize documentation across global manufacturing networks |
| Ethical and legal frameworks | Define authorship, accountability, and integrity standards for AI-generated industrial records |
| Economic impact assessments at scale | Analyze costs and workforce transformation as NLP-driven documentation becomes widespread |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rojek, I.; Małolepsza, O.; Kozielski, M.; Mikołajewski, D. Natural Language Processing in Generating Industrial Documentation Within Industry 4.0/5.0. Appl. Sci. 2025, 15, 12662. https://doi.org/10.3390/app152312662
Rojek I, Małolepsza O, Kozielski M, Mikołajewski D. Natural Language Processing in Generating Industrial Documentation Within Industry 4.0/5.0. Applied Sciences. 2025; 15(23):12662. https://doi.org/10.3390/app152312662
Chicago/Turabian StyleRojek, Izabela, Olga Małolepsza, Mirosław Kozielski, and Dariusz Mikołajewski. 2025. "Natural Language Processing in Generating Industrial Documentation Within Industry 4.0/5.0" Applied Sciences 15, no. 23: 12662. https://doi.org/10.3390/app152312662
APA StyleRojek, I., Małolepsza, O., Kozielski, M., & Mikołajewski, D. (2025). Natural Language Processing in Generating Industrial Documentation Within Industry 4.0/5.0. Applied Sciences, 15(23), 12662. https://doi.org/10.3390/app152312662

