MDPI - Publisher of Open Access Journals

41 pages, 1212 KiB

Open AccessArticle

Detection of Malicious Office Open Documents (OOXML) Using Large Language Models: A Static Analysis Approach

by Jonas Heß and Kalman Graffi

J. Cybersecur. Priv. 2025, 5(2), 32; https://doi.org/10.3390/jcp5020032 - 11 Jun 2025

Viewed by 755

The increasing prevalence of malicious Microsoft Office documents poses a significant threat to cybersecurity. Conventional methods of detecting these malicious documents often rely on prior knowledge of the document or the exploitation method employed, thus enabling the use of signature-based or rule-based approaches. [...] Read more.

The increasing prevalence of malicious Microsoft Office documents poses a significant threat to cybersecurity. Conventional methods of detecting these malicious documents often rely on prior knowledge of the document or the exploitation method employed, thus enabling the use of signature-based or rule-based approaches. Given the accelerated pace of change in the threat landscape, these methods are unable to adapt effectively to the evolving environment. Existing machine learning approaches are capable of identifying sophisticated features that enable the prediction of a file’s nature, achieving sufficient results on existing samples. However, they are seldom adequately prepared for the detection of new, advanced malware techniques. This paper proposes a novel approach to detecting malicious Microsoft Office documents by leveraging the power of large language models (LLMs). The method involves extracting textual content from Office documents and utilising advanced natural language processing techniques provided by LLMs to analyse the documents for potentially malicious indicators. As a supplementary tool to contemporary antivirus software, it is currently able to assist in the analysis of malicious Microsoft Office documents by identifying and summarising potentially malicious indicators with a foundation in evidence, which may prove to be more effective with advancing technology and soon to surpass tailored machine learning algorithms, even without the utilisation of signatures and detection rules. As such, it is not limited to Office Open XML documents, but can be applied to any maliciously exploitable file format. The extensive knowledge base and rapid analytical abilities of a large language model enable not only the assessment of extracted evidence but also the contextualisation and referencing of information to support the final decision. We demonstrate that Claude 3.5 Sonnet by Anthropic, provided with a substantial quantity of raw data, equivalent to several hundred pages, can identify individual malicious indicators within an average of five to nine seconds and generate a comprehensive static analysis report, with an average cost of USD 0.19 per request and an F1-score of 0.929. Full article

(This article belongs to the Section Security Engineering & Applications)

► Show Figures

Figure 1

37 pages, 10792 KiB

Open AccessArticle

Three-Dimensional Visualization of Articulated Mechanisms: Coupling of Their Dynamic and Virtual Models Using the Example of Driving of the Omnidirectional Mobile Robot

by Vjekoslav Damic and Maida Cohodar Husic

Appl. Sci. 2025, 15(9), 5179; https://doi.org/10.3390/app15095179 - 7 May 2025

Viewed by 630

Abstract

This paper proposes a novel approach to the virtual 3D modeling of articulated mechanisms. It follows the widespread use of XML (eXtensible Markup Language) for various applications and defines a version of XML that is specially designed for the description of 3D geometric [...] Read more.

This paper proposes a novel approach to the virtual 3D modeling of articulated mechanisms. It follows the widespread use of XML (eXtensible Markup Language) for various applications and defines a version of XML that is specially designed for the description of 3D geometric models of articulated bodies. In addition, it shows how the 3D geometric model of a mechanism can be gradually developed through the use of suitably defined elements and stored in a corresponding XML file. The developed XML model is processed, and using a powerful VTK (Visualization Toolkit) library, the corresponding virtual model is built and shown on the computer screen. To drive the virtual model, the dynamic model of the mechanism is developed using Bond Graph modeling techniques. Virtual 3D geometric and dynamic models are created using the corresponding software packages: BonSim3D 2023 Visual and BondSim 2023. The models are interconnected by a two-way named pipe. During the simulation of the dynamic model, the parameters necessary to drive the virtual model (e.g., the joint displacements) are collected and sent to the virtual model over the pipe. When the virtual model receives a package, the computer screen is updated by showing the new state of the mechanism. The approach is demonstrated using the example of a holonomic omnidirectional mobile robot. Full article

► Show Figures

Figure 1

19 pages, 4324 KiB

Open AccessArticle

Research on the Construction Method of an Assembly Knowledge Graph for a Biomass Heating System

by Zuobin Chen, Fukun Wang, Yong Gao, Jia Ai and Ya Mao

Processes 2025, 13(1), 11; https://doi.org/10.3390/pr13010011 - 24 Dec 2024

Viewed by 951

Abstract

In the complex process of assembling biomass heating systems, traditional paper documents and construction process card management methods have weak information correlation and take a long time for information retrieval, which seriously restricts the assembly efficiency and quality. Moreover, the assembly process involves [...] Read more.

In the complex process of assembling biomass heating systems, traditional paper documents and construction process card management methods have weak information correlation and take a long time for information retrieval, which seriously restricts the assembly efficiency and quality. Moreover, the assembly process involves numerous components and complex processes, making it difficult for traditional management methods to cope with. To address this issue, a knowledge graph-based assembly information integration method is proposed to integrate scattered assembly information into a graph database, providing pathways for accessing assembly information and assisting on-site management. The biomass heating system assembly knowledge graph (BAKG) adopts the top-down method construction. After the construction of the upper schema layer, the 3DXML file was parsed, the XML.dom parser in Python3.7.16 was used to extract the equipment structure information, and the RoBERTa-BiLSTM-CRF model was applied to the named entity recognition of the assembly document, which improved the accuracy of entity recognition. The experimental results show that the F1 score of the RoBERTa-BiLSTM-CRF model in entity recognition during the assembly process reaches 92.19%, which is 3.1% higher than that of the traditional BERT-BiLSTM-CRF model. Moreover, the knowledge graph structure generated by the equipment structure data based on 3DXML file is similar to the equipment structure tree, but is more clear and intuitive. Finally, taking the second-phase construction process records of a company as an example, BAKG was constructed and assembly information was stored in the Neo4j graph database in the form of graphs, which verified the effectiveness of the method. Full article

(This article belongs to the Special Issue Transfer Learning Methods in Equipment Reliability Management)

► Show Figures

Figure 1

19 pages, 5250 KiB

Open AccessArticle

Research on Parameter Correction of Distribution Network Model Based on CIM Model and Measurement Data

by Ke Zhou, Lifang Wu, Biyun Zhang, Ruotian Yao, Hao Bai, Weichen Yang and Min Xu

Energies 2024, 17(15), 3611; https://doi.org/10.3390/en17153611 - 23 Jul 2024

Viewed by 941

Abstract

The construction of an energy distribution network can improve the system’s ability to absorb new energy, and its stable and efficient operation has become more and more important. The security and stability analysis of a distribution network needs accurate distribution network model parameters [...] Read more.

The construction of an energy distribution network can improve the system’s ability to absorb new energy, and its stable and efficient operation has become more and more important. The security and stability analysis of a distribution network needs accurate distribution network model parameters as support. At present, the installation of PMU equipment in China’s distribution network synchronous phase angle measurement unit is limited, which brings challenges to the parameter correction of the distribution network. In this paper, an automatic correction algorithm of distribution network parameters based on the CIM model and measurement data is proposed for the distribution network system without PMU. Firstly, the distribution network topology construction technology based on XML files and key fields of the distribution network is proposed, and the non/small impedance devices (such as switches) are merged and reduced. The breadth-first traversal algorithm is used to verify the connectivity of the constructed topology. Then, based on the topology construction and the least squares method, an iterative parameter correction technique is constructed. Finally, the accuracy and effectiveness of the proposed algorithm are verified in a standard IEEE 33-bus system distribution network and an example of the China Southern Power Grid. The topology connections constructed based on the CIM model have significantly enhanced the efficiency of parameter correction. Full article

(This article belongs to the Section F: Electrical Engineering)

► Show Figures

Figure 1

11 pages, 558 KiB

Open AccessArticle

Fuzzy Classification Approach to Select Learning Objects Based on Learning Styles in Intelligent E-Learning Systems

by Ibtissam Azzi, Abdelhay Radouane, Loubna Laaouina, Adil Jeghal, Ali Yahyaouy and Hamid Tairi

Informatics 2024, 11(2), 29; https://doi.org/10.3390/informatics11020029 - 15 May 2024

Cited by 2 | Viewed by 1850

Abstract

In e-learning systems, even though the automatic detection of learning styles is considered the key element in the adaptation process, it does not represent the main goal of this process at all. Indeed, to accomplish the task of adaptation, it is also necessary [...] Read more.

In e-learning systems, even though the automatic detection of learning styles is considered the key element in the adaptation process, it does not represent the main goal of this process at all. Indeed, to accomplish the task of adaptation, it is also necessary to be able to automatically select the learning objects according to the detected styles. The classification techniques are the most used techniques to automatically select the learning objects by processing data derived from learning object metadata. By using these classification techniques, considerable results are obtained via several approaches and consist of mapping the learning objects into different teaching strategies and then mapping these strategies into the identified learning styles. However, these approaches have some limitations related to robustness. Indeed, a common feature of these approaches is that they do not directly map learning object metadata elements to learning style dimensions. Moreover, they do not consider the fuzzy nature of learning objects. Indeed, any learning object can be suitable for different learning styles at varying degrees of suitability. This highlights the need to find a way to remedy this shortcoming. Our work is part of the automatic selection of learning objects. So, we will propose an approach that uses the fuzzy classification technique to select learning objects based on learning styles. In this approach, the metadata of each learning object that complies with the Institute of Electrical and Electronics Engineers (IEEE) standard are stored in a database as an Extensible Markup Language (XML) file. The Fuzzy C Means algorithm is used, on one hand, to assign fuzzy suitability rates to the stored learning objects and, on the other hand, to cluster them into the Felder and Silverman learning styles model categories. The experiment results show the performance of our approach. Full article

► Show Figures

Figure 1

16 pages, 4290 KiB

Open AccessArticle

Implementing an Agent-Based Modeling Approach for Protein Glycosylation in the Golgi Apparatus

by Christian Jetschni and Peter Götz

Fermentation 2023, 9(9), 849; https://doi.org/10.3390/fermentation9090849 - 15 Sep 2023

Cited by 1 | Viewed by 1962

Abstract

Glycoproteins are involved in various significant biological processes and have critical biological functions in physiology and pathology by regulating biological activities and molecular signaling pathways. The variety of enzymes used in protein glycosylation and the wide range of diversity in the resulting glycoproteins [...] Read more.

Glycoproteins are involved in various significant biological processes and have critical biological functions in physiology and pathology by regulating biological activities and molecular signaling pathways. The variety of enzymes used in protein glycosylation and the wide range of diversity in the resulting glycoproteins pose a challenging task when attempting to simulate these processes in silico. This study aimed to establish and define the necessary structures to simulate the process of N-glycosylation in silico. In this article, we represent the process of glycosylation in the Golgi structure in an agent-based model with defined movement patterns and reaction rules between the associated proteins and enzymes acting as agents. The Golgi structure is converted into a grid consisting of 150 × 400 patches representing four compartments which contain a specific distribution of the fundamental enzymes contributing to the process of glycosylation. The interacting glycoproteins and membrane-bound enzymes are perceived as agents, with their own rules for movement, complex formation, biochemical reaction and dissociation. The resulting structures were saved into an XML-format, a mass spectrometry file and a GlycoWorkbench2-compatible file for visualization. Full article

(This article belongs to the Special Issue Modeling Methods for Fermentation Processes)

► Show Figures

Figure 1

14 pages, 6187 KiB

Open AccessConcept Paper

Communication of Design Data in Manufacturing Democratization

by Bhairavsingh Ghorpade and Shivakumar Raman

J. Manuf. Mater. Process. 2023, 7(3), 108; https://doi.org/10.3390/jmmp7030108 - 1 Jun 2023

Viewed by 1766

Abstract

Part design is the principal source of communicating design intent to manufacturing and inspection. Design data are often communicated through computer-aided design (CAD) systems. Modern analytics tools and artificial intelligence integration into manufacturing have significantly advanced machine recognition of design specification and manufacturing [...] Read more.

Part design is the principal source of communicating design intent to manufacturing and inspection. Design data are often communicated through computer-aided design (CAD) systems. Modern analytics tools and artificial intelligence integration into manufacturing have significantly advanced machine recognition of design specification and manufacturing constraints. These algorithms require data to be uniformly structured and easily consumable; however, the design data are represented in a graphical structure and contain a nonuniform structure, which limits the use of machine learning algorithms for a variety of tasks. This paper proposes an algorithm for extracting dimensional data from three-dimensional (3D) part designs in a structured manner. The algorithm extracts face dimensions and their relationships with other faces, enabling the recognition of underlying patterns and expanding the applicability of machine learning for various tasks. The extracted part dimensions can be stored in a dimension-based numeric extensible markup language (XML) file, allowing for easy storage and use in machine-readable formats. The resulting XML file provides a dimensional representation of the part data based on their features. The proposed algorithm reads and extracts dimensions with respect to each face of the part design, preserving the dimensional and face relevance. The uniform structure of the design data facilitates the processing of data by machine learning algorithms, enabling the detection of hidden patterns and the development of pattern-based predictive algorithms. Full article

► Show Figures

Figure 1

22 pages, 4531 KiB

Open AccessArticle

Comparative Study of Moodle Plugins to Facilitate the Adoption of Computer-Based Assessments

by Milagros Huerta, Juan Antonio Caballero-Hernández and Manuel Alejandro Fernández-Ruiz

Appl. Sci. 2022, 12(18), 8996; https://doi.org/10.3390/app12188996 - 7 Sep 2022

Cited by 7 | Viewed by 4429

Abstract

The use of Learning Management Systems (LMS) has had rapid growth over the last decades. Great efforts have been recently made to assess online students’ performance level, due to the COVID-19 pandemic. Faculty members with limited experience in the use of LMS such [...] Read more.

The use of Learning Management Systems (LMS) has had rapid growth over the last decades. Great efforts have been recently made to assess online students’ performance level, due to the COVID-19 pandemic. Faculty members with limited experience in the use of LMS such as Moodle, Edmodo, MOOC, Blackboard and Google Classroom face challenges creating online tests. This paper presents a descriptive and comparative study of the existing plugins used to import questions into Moodle, classifying them according to the necessary computing resources. Each of the classifications were compared and ranked, and features such as the support for gamification and the option to create parameterised questions are explored. Parameterised questions can generate a large number of different questions, which is very useful for large classes and avoids fraudulent behaviour. The paper outlines an open-source plugin developed by the authors: FastTest PlugIn, recently approved by Moodle. FastTest PlugIn is a promising alternative to mitigate the detected limitations in analysed plugins. FastTest PlugIn was validated in seminars with 230 faculty members, obtaining positive results about expectations and potential recommendations. The features of the main alternative plugins are discussed and compared, describing the potential advantages of FastTest PlugIn. Full article

(This article belongs to the Collection The Application and Development of E-learning)

► Show Figures

Figure 1

30 pages, 911 KiB

Open AccessArticle

Revisiting the Detection of Lateral Movement through Sysmon

by Christos Smiliotopoulos, Konstantia Barmpatsalou and Georgios Kambourakis

Appl. Sci. 2022, 12(15), 7746; https://doi.org/10.3390/app12157746 - 1 Aug 2022

Cited by 13 | Viewed by 6501

Abstract

This work attempts to answer in a clear way the following key questions regarding the optimal initialization of the Sysmon tool for the identification of Lateral Movement in the MS Windows ecosystem. First, from an expert’s standpoint and with reference to the relevant [...] Read more.

This work attempts to answer in a clear way the following key questions regarding the optimal initialization of the Sysmon tool for the identification of Lateral Movement in the MS Windows ecosystem. First, from an expert’s standpoint and with reference to the relevant literature, what are the criteria for determining the possibly optimal initialization features of the Sysmon event monitoring tool, which are also applicable as custom rules within the config.xml configuration file? Second, based on the identified features, how can a functional configuration file, able to identify as many LM variants as possible, be generated? To answer these questions, we relied on the MITRE ATT and CK knowledge base of adversary tactics and techniques and focused on the execution of the nine commonest LM methods. The conducted experiments, performed on a properly configured testbed, suggested a great number of interrelated networking features that were implemented as custom rules in the Sysmon’s config.xml file. Moreover, by capitalizing on the rich corpus of the 870K Sysmon logs collected, we created and evaluated, in terms of TP and FP rates, an extensible Python .evtx file analyzer, dubbed PeX, which can be used towards automatizing the parsing and scrutiny of such voluminous files. Both the .evtx logs dataset and the developed PeX tool are provided publicly for further propelling future research in this interesting and rapidly evolving field. Full article

(This article belongs to the Special Issue Advanced Technologies in Data and Information Security II)

► Show Figures

Figure 1

19 pages, 5331 KiB

Open AccessArticle

Geographical Data and Metadata on Land Administration in Spain

by Gaspar Mora-Navarro, Carmen Femenia-Ribera, Joan Manuel Velilla Torres and Jose Martinez-Llario

Land 2022, 11(7), 1107; https://doi.org/10.3390/land11071107 - 19 Jul 2022

Cited by 5 | Viewed by 4191

Abstract

Spain has a tax-oriented cadastre with legal data about properties (ownership, rights, liens, charges, and restrictions) recorded in a separate property rights registry (henceforth called land registry). This paper describes the Spanish cadastre and land registry by focusing on the new coordination system [...] Read more.

Spain has a tax-oriented cadastre with legal data about properties (ownership, rights, liens, charges, and restrictions) recorded in a separate property rights registry (henceforth called land registry). This paper describes the Spanish cadastre and land registry by focusing on the new coordination system set by Law 13/2015. Since Law 13/2015 came into force in Spain, cadastral cartography is the basis for knowing where land registry units are located. The new coordination system sets a procedure to update the cadastral parcel boundary of a property when it does not match with reality. In these cases, the free-profession land surveyor sends the new property boundary through the Internet in order to update the corresponding cadastral parcel boundary. Currently, neither the cadastre nor the land registry has considered storing geographical metadata for each property boundary in a standardised way. As boundaries show the limits of individual properties, boundary metadata denote the accuracy with which such ownership rights are indicated. We propose that, for these boundary update cases, the Spanish cadastre also allows the upload of qualitative and quantitative instances of the data quality class of the Spanish Metadata Core standard, and this information be available for users, for example in an XML file. These metadata provide justified information about how the boundary has been obtained and its accuracy. Software has been developed to manage this metadata of each property boundary, in order to allow us to evaluate whether or not this information is useful. We present the conclusions about some real-life tests of property delimitations. Full article

► Show Figures

Figure 1

11 pages, 17964 KiB

Open AccessEditor’s ChoiceData Descriptor

Dataset: Roundabout Aerial Images for Vehicle Detection

by Enrique Puertas, Gonzalo De-Las-Heras, Javier Fernández-Andrés and Javier Sánchez-Soriano

Data 2022, 7(4), 47; https://doi.org/10.3390/data7040047 - 12 Apr 2022

Cited by 14 | Viewed by 6529

Abstract

This publication presents a dataset of Spanish roundabouts aerial images taken from a UAV, along with annotations in PASCAL VOC XML files that indicate the position of vehicles within them. Additionally, a CSV file is attached containing information related to the location and [...] Read more.

This publication presents a dataset of Spanish roundabouts aerial images taken from a UAV, along with annotations in PASCAL VOC XML files that indicate the position of vehicles within them. Additionally, a CSV file is attached containing information related to the location and characteristics of the captured roundabouts. This work details the process followed to obtain them: image capture, processing, and labeling. The dataset consists of 985,260 total instances: 947,400 cars, 19,596 cycles, 2262 trucks, 7008 buses, and 2208 empty roundabouts in 61,896 1920 × 1080 px JPG images. These are divided into 15,474 extracted images from 8 roundabouts with different traffic flows and 46,422 images created using data augmentation techniques. The purpose of this dataset is to help research into computer vision on the road, as such labeled images are not abundant. It can be used to train supervised learning models, such as convolutional neural networks, which are very popular in object detection. Full article

► Show Figures

Figure 1

19 pages, 780 KiB

Open AccessArticle

Ransomware-Resilient Self-Healing XML Documents

by Mahmoud Al-Dwairi, Ahmed S. Shatnawi, Osama Al-Khaleel and Basheer Al-Duwairi

Future Internet 2022, 14(4), 115; https://doi.org/10.3390/fi14040115 - 7 Apr 2022

Cited by 13 | Viewed by 4101

Abstract

In recent years, various platforms have witnessed an unprecedented increase in the number of ransomware attacks targeting hospitals, governments, enterprises, and end-users. The purpose of this is to maliciously encrypt documents and files on infected machines, depriving victims of access to their data, [...] Read more.

In recent years, various platforms have witnessed an unprecedented increase in the number of ransomware attacks targeting hospitals, governments, enterprises, and end-users. The purpose of this is to maliciously encrypt documents and files on infected machines, depriving victims of access to their data, whereupon attackers would seek some sort of a ransom in return for restoring access to the legitimate owners; hence the name. This cybersecurity threat would inherently cause substantial financial losses and time wastage for affected organizations and users. A great deal of research has taken place across academia and around the industry to combat this threat and mitigate its danger. These ongoing endeavors have resulted in several detection and prevention schemas. Nonetheless, these approaches do not cover all possible risks of losing data. In this paper, we address this facet and provide an efficient solution that would ensure an efficient recovery of XML documents from ransomware attacks. This paper proposes a self-healing version-aware ransomware recovery (SH-VARR) framework for XML documents. The proposed framework is based on the novel idea of using the link concept to maintain file versions in a distributed manner while applying access-control mechanisms to protect these versions from being encrypted or deleted. The proposed SH-VARR framework is experimentally evaluated in terms of storage overhead, time requirement, CPU utilization, and memory usage. Results show that the snapshot size increases proportionately with the original size; the time required is less than 120 ms for files that are less than 1 MB in size; and the highest CPU utilization occurs when using the bzip2. Moreover, when the zip and gzip are used, the memory usage is almost fixed (around 6.8 KBs). In contrast, it increases to around 28 KBs when the bzip2 is used. Full article

(This article belongs to the Topic Cyber Security and Critical Infrastructures)

► Show Figures

Figure 1

7 pages, 1238 KiB

Open AccessData Descriptor

Dataset: Variable Message Signal Annotated Images for Object Detection

by Enrique Puertas, Gonzalo De-Las-Heras, Javier Sánchez-Soriano and Javier Fernández-Andrés

Data 2022, 7(4), 41; https://doi.org/10.3390/data7040041 - 1 Apr 2022

Cited by 3 | Viewed by 3774

Abstract

This publication presents a dataset consisting of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Additionally, a CSV file is attached with [...] Read more.

This publication presents a dataset consisting of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Additionally, a CSV file is attached with information regarding the geographic position, the folder where the image is located and the text in Spanish. This can be used to train supervised learning computer vision algorithms such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition and labeling and its specifications are detailed. The dataset constitutes 1216 instances, 888 positives and 328 negatives, in 1152 jpg images with a resolution of 1280 × 720 pixels. These are divided into 756 real images and 756 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs. Full article

► Show Figures

Figure 1

21 pages, 3813 KiB

Open AccessFeature PaperArticle

A gbXML Reconstruction Workflow and Tool Development to Improve the Geometric Interoperability between BIM and BEM

by Yikun Yang, Yiqun Pan, Fei Zeng, Ziran Lin and Chenyu Li

Buildings 2022, 12(2), 221; https://doi.org/10.3390/buildings12020221 - 16 Feb 2022

Cited by 32 | Viewed by 7308

Abstract

The BIM-based building energy simulation plays an important role in sustainable design on the track of achieving the net-zero carbon building stock by 2050. However, the issues on BIM-BEM interoperability make the design process inefficient and less automatic. The insufficient semantic information may [...] Read more.

The BIM-based building energy simulation plays an important role in sustainable design on the track of achieving the net-zero carbon building stock by 2050. However, the issues on BIM-BEM interoperability make the design process inefficient and less automatic. The insufficient semantic information may lead to results inaccurate while the error-prone geometry will terminate the simulation engine. Defective models and authoring tools lagging behind the standard often cause failures in creating a clean geometry that is acceptable to the simulation engine. This project aims to develop a workflow that helps with the documentation of a lightweight geometry in gbXML format. The implemented workflow bypasses the modeling inaccuracies and irrelevant details by reconstructing the model based on extrusions on patched floor plans. Compared with other gbXML files exported by BIM authoring tools, the resulting gbXML is more lightweight with airtight space boundaries. The gbXML has been further tested against EnergyPlus to demonstrate its capability in aiding a seamless geometry exchange between BIM and BEM. Full article

(This article belongs to the Special Issue AI-Aided Carbon Engineering in the AEC Industry)

► Show Figures

Figure 1

13 pages, 4741 KiB

Open AccessArticle

A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R

by Johannes Rainer, Andrea Vicini, Liesa Salzer, Jan Stanstrup, Josep M. Badia, Steffen Neumann, Michael A. Stravs, Vinicius Verri Hernandes, Laurent Gatto, Sebastian Gibb and Michael Witting

Metabolites 2022, 12(2), 173; https://doi.org/10.3390/metabo12020173 - 11 Feb 2022

Cited by 63 | Viewed by 11639

Abstract

Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting [...] Read more.

Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the MetaboCoreUtils, MetaboAnnotation and CompoundDb packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS¹ properties such as m/z and retention times, followed by an MS²-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the CompoundDb package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor. Full article

(This article belongs to the Special Issue Advanced Strategies and Tools for Metabolomics Data Analysis, Metabolite Annotation and Identification)

► Show Figures

Figure 1

Search Results (31)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (31)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI