Next Article in Journal
From Hashtags to Fame: Content Strategies of Generation Z TikTok Influencers in Israel
Previous Article in Journal
Are We Ready for Synchronous Conceptual Modeling in Augmented Reality? A Usability Study on Causal Maps with HoloLens 2
Previous Article in Special Issue
Towards a Conceptual Modeling of Trustworthiness in AI-Based Big Data Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Long-Term Preservation of Emotion Analyses with OAIS: A Software Prototype Design Approach for Information Package Conversion in KM-EP

by
Verena Schreyer
*,
Michael Pfalzgraf
,
Marco Xaver Bornschlegl
and
Matthias Hemmje
Faculty of Mathematics and Computer Science, University of Hagen, 58097 Hagen, Germany
*
Author to whom correspondence should be addressed.
Information 2025, 16(11), 951; https://doi.org/10.3390/info16110951
Submission received: 27 August 2025 / Revised: 26 October 2025 / Accepted: 29 October 2025 / Published: 3 November 2025
(This article belongs to the Special Issue Advances in Human-Centered Artificial Intelligence)

Abstract

Ensuring the long-term preservation of complex, multimodal research, data such as video-based Emotion Analyses, is a growing challenge in digital science. Although numerous standards like OAIS, METS, and PREMIS define principles for sustainable archiving, their concrete integration into operational research environments remains limited. This paper presents a software design approach for OAIS-compliant preservation that enables automated transformation, packaging, and validation of Emotion-Data Packages within the Knowledge Management Ecosystem Portal (KM-EP). The proposed prototype converts analysis data into archival formats, generates structured METS/PREMIS metadata, and embeds integrity and authenticity checks through hash values and digital signatures. A user-centered, step-by-step configuration interface supports both export and reimport of Emotion-Data Packages. The work provides a specific use case of implementing OAIS-compliant long-term preservation in the KM-EP context.

Graphical Abstract

1. Introduction and Motivation

The challenge of preserving data long-term is for instance faced by the healthcare sector as the number of video conferences increased during the global spread of COVID-19 [1,2]. Therefore, the subsequent analysis and documentation of patient interviews is possible by recording the sessions and using Emotion Recognition Algorithms [3] to support therapists analyzing and documenting the sessions. Emotion Recognition in the context of this paper is conducted using machine learning (ML) algorithms analyzing video recordings. By analyzing collected emotion sequence data, experts can not only recognize emotional outbursts, but also better understand their causes [3]. Therapists might use the recorded emotions for subsequent analysis for further session analysis and documentation that can be stored in the patient’s medical record. The analysis results and documents need to be preserved and managed in the long term.
Therefore, the ISO Standard ISO14721:2025 for an Open Archival Information System (OAIS) [4] acts as a Reference Model Framework that establishes standards and guidelines for the long-term storage of and access to digital information. The OAIS Reference Model Framework includes best practices for data storage, metadata management, data replication, security measures, and much more to ensure the integrity and longevity of digital resources [5]. Storing information in digital form poses a significant challenge that goes far beyond storage on paper or film. In particular, the rapid obsolescence of digital technologies harbors considerable technical risks that can threaten the ability to restore, reproduce, or interpret information. Strategies for risk avoidance or minimization are addressed and described in detail within the OAIS Reference Model Framework [5].
The R&D project presented in this paper is placed in the research project introduced by Schreyer et al. [6] to address the third Research Question (RQ) and will also be implemented in the Knowledge Management Ecosystem Portal (KM-EP). The aim of the R&D project is to investigate how an existing software prototype can be extended to include an interface that uses standards like OAIS, METS, and PREMIS on a specific use case. Therefore, an existing software prototype will be extended [6] for long-term preservation of packages of video recordings, emotion sequence data, and session analysis data and documents (in the following to be considered as Emotion-Data Packages (EDPs)) will be implemented to ensure accessibility and interpretability in long-termusing the OAIS Reference Model Framework [5]. Long-term preservation refers to the process of permanently storing and preserving digital data in long-term, often decades or even centuries. Appropriate measures are taken to ensure that the data remain intact and accessible during this time, even if technologies and file formats change over time [5].
KM-EP currently lacks effective archiving solutions for EDPs. The implementation needs to take account of the diversity and complexity of the requirements for data transformation according to the OAIS Reference Model Framework. This challenge affects the design and development of an effective archiving solution in KM-EP to ensure the long-term integrity and accessibility of EDPs according to the OAIS Reference Model Framework.
How can we provide data transformations in the KM-EP to prepare Emotion-Data Packages for long-term preservation in accordance with the OAIS Reference Model Framework and at the same time enabling sustainable reimports of archived packages into KM-EP while ensuring integrity and completeness of the archived data? (Research Question 1)
In this context, it is crucial that the data are stored efficiently to optimize storage space and improve transmission speed. Data Compression, as defined by Salomon et al. [7], is an essential aspect of this. “Data compression is the process of converting an input data (…) into another data stream (…) that has a smaller size.” [7]. Data Compression reduces the storage space requirements of data by removing redundant or non-essential information [7]. However, this compressed data stream can only be used again once it has been decompressed, i.e., converted back into its original format. Data Decompression is the process by which the compressed data stream is returned to its original state to enable complete and correct reconstruction of the original data [8]. KM-EP currently lacks effective Data Compression and Data Decompression techniques for the components of EDPs to reduce storage space requirements and improve archiving efficiency.
How can we provide Data Compression and Data Decompression within the KM-EP to support reliable long-term archiving of Emotion-Data Packages and ideally reduce storage requirements and enable reuse of emotion-data within KM-EP? (Research Question 2)
There is currently no mechanism within KM-EP for exporting and importing EDPs according to the OAIS Reference Model Framework using step-by-step KM-EP user-interface dialogs. This gap leads to challenges in the seamless integration of EDPs, which contain a variety of scientific information in different file formats, into the long-term archiving processes. Given the complexity and diversity of the EDPs, the development of effective solutions to ensure smooth data exchange and correct archiving is crucial to guarantee the integrity and accessibility of archived information in long-term . There are currently no effective mechanisms available for the export of EDPs from KM-EP and for the import of EDPs into KM-EP compliant to the OAIS Reference Model Framework.
How can we provide export and import of Emotion-Data Packages in the KM-EP in compliance with the OAIS Reference Model Framework using step-by-step KM-EP user-interface dialogs and ensuring data integrity and accessibility? (Research Question 3)
To address the research questions, the multimethodological approach for information system research by Nunamaker et al. [9] will be used, like in the research project [6]. This approach includes four research strategies that structure this paper. The second section follows the phase of observation to evaluate the current state of the art in science and technology. The third section presents the results of the phase of theory building, where innovative ideas and concepts are being developed and conceptual frameworks are constructed. The phase of system development and thus the fourth section consists of system design methods using prototyping, development of the product, and technology transfer. The fifth section and last strategy is the phase of experimentation which applies experiments and simulations to validate methods and theories. The last section will summarize the paper and give an outlook on future work [9].

2. State of the Art in Science and Technology

An overview of the relevant terms and existing technologies and models will be provided by presenting the current state of science and technology regarding long-term preservation of EDPs. This is part of the goals in the phase observation according to the methodological framework by Nunamaker et al. [9]. To do so, the current state of the art in science and technology will be observed to identify remaining challenges (RCs) that will be summarized at the end of each subsection.

2.1. OAIS and Digital Preservation

The Open Archival Information System (OAIS) Reference Model Framework [4] aims to address the deprecation of hardware and software in the digital century which might cause information to not be available and preservable over long time periods. To make information available and accessible, and keep their integrity in long-term, organizations need to follow the Reference Model Framework for an OAIS. The OAIS Reference Model Framework is an implementable framework providing a set of interoperable protocols and interface specifications. In this chapter, the OAIS models and data flows will be presented, as well as suitable data formats for digital preservation, after introducing some terms [5].
An archive in the context of an OAIS Reference Model Framework is an organization that preserves information and makes it available for the designated community. Information is knowledge being exchanged in combination with representational information on how to interpret the data. Information may therefore be defined as data with meaning, as data consists of characters like letters and numbers. Therefore, representational information is needed to add meaning to data objects and to enable understanding of the context by combining structural and semantic elements [5].
To define the representational information, the knowledge base of the designated community needs to be observed. The designated community is defined as a group of users that need to access the archived information and have the necessary knowledge to understand it [5]. Therefore, it is essential for an effective long-term preservation to define the designated community which can be defined as the following groups in the context of the research project [6]:
Medical Diagnostic Experts: use the video recording of the human–human dialog and the additional emotion sequence-data to help diagnosing.
Computer Science Emotion Analysis Experts: analyze if the recognized emotion sequence data in the video recordings are appropriate to the reactions shown by the conversation participants in the video recordings to improve emotion detection, for instance.
Emotion Experts: use emotion sequence data to identify interesting parts of the conversation and analyze the emotions shown by the conversation participants in the video recordings.
According to [5], the information objects that need to be accessed by the designated community will be added to information packages which are logical containers consisting of information objects of the following types:
Content Information: set of information that is the original target of preservation consisting of a content data object and its representational information.
Preservation Description information (PDI): contains information necessary to adequately preserve the Content Information it is associated with.
Packaging Information: describes the components of the information package and how these are connected through logical or physical conditions as well as methods to identify and extract these.
Descriptive Information: set of information supporting finding, ordering, and retrieving information by consumers.
There are diverse types of information packages depending on the current preservation process step. The data delivered by the producer is called submission information package (SIP) which needs to be converted into an archival information package (AIP) to store the data within the archive. When the consumer requests information from the archive the AIP needs to be converted to a dissemination information package (DIP) as a response [5].
The information packages equal the predefined EDPs to be archived. To provide these information packages with PDI, metadata schemas must be incorporated to ensure longevity. Therefore, PREMIS and METS were selected, as these two standards are particularly well suited for the implementation of the OAIS Reference Model Framework and support comprehensive management and preservation of digital objects [5].
PREMIS has been developed with consideration of the OAIS Reference Model Framework and provides a report with an implementable set of preservation description information (PDI) and a data dictionary with guidelines for metadata management. The data dictionary defines a data model which specifies the PREMIS entities that must be described, and which semantic units must be provided for these [10,11].
For the management of preservation metadata, the PREMIS Editorial Group has published guidelines that recommend strategically placing PREMIS elements within the METS structure, targeting redundancies, and managing structural relationships through the METS structure plan instead of the PREMIS relationship elements. By documenting specific implementation decisions, METS profiles further emphasize the importance of the adaptability of both standards for effective digital archiving strategies [11].
METS is a structured method for managing digital library objects. It defines specific sections that describe the metadata and file the hierarchies and behavior of digital objects. These sections enable the comprehensive organization, classification, management, and use of digital resources, ensuring their integrity and accessibility [11,12].
Observing the current state of the art for long-term digital preservation using the OAIS Reference Model Framework, the following remaining challenges can be identified. These will be addressed by Section 3 to model and design the system.
How does representational information need to be modeled and designed to meet the requirements of the OAIS Reference Model Framework? (RC1.1)
How does PDI need to be modeled and designed to meet the requirements of the OAIS Reference Model Framework? (RC1.2)
How do Emotion-Data Packages need to be modeled and designed to meet the requirements of the OAIS Reference Model Framework? (RC1.3)
How does data need to be transformed to enable efficient preservation and accessibility in the long-term according to the OAIS Reference Model Framework? (RC1.4)
How do metadata need to be modeled and designed to meet the requirements of the OAIS Reference Model Framework as well as METS and PREMIS standards? (RC1.5)
How does content information need to be modeled and designed to meet the requirements of the OAIS Reference Model Framework? (RC1.6)

2.2. Data Compression and Data Decompression

To archive information packages, Data Compression mechanisms are used to reduce the size of the EDPs and thus the storage requirements of an archive. Based on [8], Data Compression minimizes the amount of data required to display digital information by efficiently exploiting both perceptual limitations of the user and existing structures in the data. Data Compression is differentiated between lossless and lossy compression. Lossless compressions are important if the original data need to be exactly restored from the original data, like for texts and tables [8,13]. In contrast, lossy compression is often sufficient for media such as photos, videos, and audio data. This type of compression utilizes the fact that humans can only perceive limited resolutions. Insignificant information that is not perceived can therefore be omitted. This process of reducing irrelevant parts typically leads to lossy compression [13]. The Data Compression is specifically relevant for the video recordings. To make the preservation of video files storage-efficient in the long-term, there are different forms of video conversions possible. In this R&D project, two different options will be implemented for the user to choose while exporting.
One of them is archiving MP4 videos with H.264 compression format which achieves significant reduction in file size while maintaining image quality [14]. This solution is suitable for archiving video files for applications with limited storage [14].
The second option will be FFV1 in a Matroska container (MKV), which offers a good archiving solution [15]. The MKV is a free and open container format designed to store diverse types of multimedia content in a single file [15]. The FFV1 codec offers lossless compression, which guarantees high quality video data [16]. However, lossless compression results in significantly larger files, making FFV1 compared to H.264 less suitable for applications where file size reduction is a decisive criterion.
Other options to reduce the file size without losing analysis-relevant data might be the migration of the raw videos to grayscale and, if necessary, the elimination of the audio track.
This R&D project deliberately refrains from investigating the possibility of enriching MKV files with metadata such as comments, annotations, and segments. The reason for this is that the complexity of this endeavor would go beyond the scope of the work and would entail considerable technical and conceptual risks, particularly regarding the reverse transformation from the MKV format. The main aim of this work is to design and export the EDPs in such a way that they are suitable for long-term archiving in accordance with the specifications of the OAIS Reference Model Framework.
Observing the current state of the art for Data Compression and Data Decompression, the following remaining challenges can be identified. These will be addressed by Section 3 to model and design the system.
How can video recordings be converted between MP4 and MKV for long-term preservation? (RC2.1)
How can video recordings be converted to H.264 codec to enable a compromise between minimal loss of quality and optimized file size and to meet the requirements for conditional long-term suitability? (RC2.2)
How can video recordings be converted to grayscale to reduce file size? (RC2.3)
How can audio of video recordings be removed to reduce file size? (RC2.4)
How does the variation in video compression in terms of format selection, migration to grayscale, and elimination of the audio track affect the file size? (RC2.5)

2.3. Data Integrity and Accessibility

This chapter focuses on the methods used to ensure data integrity when transferring data. Data integrity is ensuring the accuracy, consistency, and trustworthiness of data throughout its lifecycle, as these factors are critical to the reliability and usability of data in various applications. However, it does not deal with the deliberate manipulation of data by third parties [17].
Hash functions can be used to ensure data integrity. A hash function is essentially a mathematical function that compresses data from a large or infinite domain into a fixed, smaller area [18]. Typically, cryptographic hash functions, such as SHA-256, map input strings of arbitrary length to bit strings of a fixed length, in this case 256 bits [18]. These strings are generated from digital objects to recognize errors during transmission and storage and serve the purpose of verifying the integrity of data by making it possible to reliably identify changes or errors that could occur during transmission or storage [11]. The hash codes can be recorded either in METS syntax (as an attribute of the <file> element in the file inventory section) or in PREMIS syntax (as an element within the fixity information in the object entity) [12,19]. If PREMIS is integrated in METS, simultaneous use in both formats can lead to problems [11]. It is therefore crucial that clear guidelines are formulated for such duplications to clarify which version has priority and how the system should deal with possible inconsistencies [11].
Another important aspect is data encryption, which guarantees both the integrity and confidentiality of the data [20]. The integrity and authenticity of data are ensured using asymmetric encryption. The sender signs the message with their private key, and the recipient can verify the signature with the sender’s public key [21]. The implementation of encryption techniques is deliberately omitted in this work, as these are not the focus of the investigation. Instead, the focus is on ensuring file integrity and implementing digital signatures in the metadata to guarantee the authenticity and immutability of the data. Therefore, PREMIS supports the management of digital signature using the “signatureInformation” tag to store detailed information about digital signatures [22]. This element is part of the “objectCharacteristicsExtension” tag and comprises several important sub-tags accordingly to store information about the method to create the digital signature, the actual value of the signature, rules or algorithms to validate the digital signature, and information about the key used to create or verify the signature [22]. By using the “signatureInformation” tag, PREMIS can ensure that all relevant information on the digital signature of a digital object is documented in a clear and structured manner [22]. This contributes significantly to the long-term preservation and trustworthiness of digital objects by making it possible to prove their authenticity and integrity in the future [10].
Observing the current state of the art for data integrity and accessibility, the following remaining challenges can be identified. These will be addressed by Section 3 to model and design the system.
How does the generation of hash values need to be modeled and designed to enable data integrity? (RC3.1)
How can metadata be extended to ensure the integrity of Emotion-Data Packages? (RC3.2)
How can digital signatures be modeled and designed to ensure data authenticity? (RC3.3)
How can metadata be extended to ensure data authenticity? (RC3.4)
How can the user be enabled to import and export Emotion-Data Packages using a step-by-step KM-EP user-interface dialog? (RC3.5)

3. Conceptual Modeling and Design

This section will address the remaining challenges identified in the previous section. This is part of the goals in the phase theory building according to the methodological framework by Nunamaker et al. [9]. To be able to model and design a software prototype that focuses on the needs and requirements of users, the User-Centered Design (UCD) methodology according to Norman and Draper will be used. According to this methodology, system design consists of the four phases of context definition, requirements analysis, concept design, and evaluation. The software prototype will be modeled and designed for the use cases of import and export. For the export, users should be able to export EDPs using a step-by-step KM-EP user-interface dialog with the options for video conversion, private keys, and file names using a user interface. Also, users should be able to import EDPs with an automatic check for valid data, data integrity, and data authenticity. The first subsection will therefore model the data transformations and data creation to match the OAIS Reference Model Framework. The second subsection will provide models for Data Compression and Decompression. The third subsection will model processes for data integrity and authenticity, as well as user interfaces for import and export.

3.1. Data Transformation and Metadata Generation

This section will model the components and structure of theEDP. Therefore, the content information, suitable data formats, and necessary data conversions will be designed. To apply the OAIS reference mode, this section will also model the representational information, PDI, and METS/PREMIS metadata.
Addressing RC1.3 by modeling the folder structure of the EDP in the form of a ZIP folder ensures that the various file types, such as XML files, video recordings, and other file attachments, can be organized and exported in a structured manner. The ZIP will therefore contain the following components:
Metadata XML file containing the representational data and PDI in METS/PREMIS format;
XML file containing the database entities in the context of the emotion analysis;
XSD schema file for the database entities XML file;
Additional files of the emotion analysis like video recordings and emotion sequence data.
The content information (RC1.6) is represented through files, like video recordings or emotion sequence data, but also through exports of database entities which provide information to Emotion Analyses, like, for example, details of conversation participants. The following files are expected to be part of the content information:
Database entities and their relations (XML);
Video recordings (MP4/MKV);
Emotion sequence data or further analysis data, like the configuration of the Emotion Recognition Algorithms (Text, CSV, and JSON).
File formats like JSON need to be converted to text files. For converted files such as JSON files and video recordings, the original must be added to the metadata in the EDP to ensure correct reimports. After successful processing, the files will be added to a ZIP folder during the export process. During the import, the files need to be converted to their original format and linked to the correct database entities (RC1.4).
To support the export and import of the database entities and to make the determination of content information flexible and configurable, a JSON configuration file for each entity will be provided, which corresponds to the following JSON model, as shown in Figure 1. This model can be used to support both the generation and import of XML-wrapped content information as well as the creation of an XSD file for the database entities.
To interpret the database entities and other content information, representational information is needed (RC1.1) which can be split up into different types [5]. The essential representational information is needed for the interpretation of a digital object and therefore contains metadata that summarize technical and semantic properties. Other types, like structural or semantic representational information, dependencies and their properties, data dictionary, field definitions, glossary, and glossary terms, are needed to interpret and process the digital objects and give them the correct meaning and context [5].
It should be noted at this point that the complete modeling of all required representational information to apply the OAIS Reference Model Framework is not the focus of this R&D project and would exceed the scope of this paper. Rather, the focus of this paper is on investigating whether an OAIS-compliant export and import of analysis objects is feasible in principle and which framework conditions are required for this. The complete elaboration of the representation information could be a promising topic for future research.
To address RC1.2, the PDI is needed to preserve the content information and can be structured in five different types:
Reference information helps to identify and search the information object [5].
Context information describes the context of the information object and relevant events or linked objects [5].
Provenance information holds information to the origin, change history, and which agent is responsible for the information [5].
Fixity information ensures the integrity of the information object by storing hash values [5].
Access right information controls access and usage rights [5].
This R&D project deliberately refrains from comprehensively considering the change history in the provenance information, as this would increase complexity in extending the software prototype and is not the focus of the research. The need to document a possible format change, as for JSON and MP4 files, is fulfilled by the introduction of the “OriginalFormat”, “NewFormat”, and “FormatChangedAt” attributes as provenance information. To reduce complexity, not all possible PDI attributes are modeled and implemented. Furthermore, the access and user rights are only given as examples and contain purely fictitious values.
To structure the PDI and representational information, the metadata XML file will be created using METS/PREMIS (RC1.5). The metadata XML file serves as an additional file for the EDP and enables the OAIS-compliant export. The metadata structure ensures that all relevant information for long-term archiving and reusability follows the OAIS Reference Model Framework. The file will be structured as shown in Figure 2. The blue-colored elements represent the METS components which are used for structuring the metadata. The green-colored elements represent the PREMIS components, and the purple-colored elements are user-defined extensions (custom elements) that are integrated via a PREMIS extension.
Regarding the OAIS information model, the contents of information packages can be mapped into the METS and PREMIS components. Information objects are represented by the METS components “fileSec” and “structMap”, which describe the physical files and their organization within the digital object. These elements reference the actual data and their structure. The representational information is mainly covered by the METS component “dmdSec”, which contains descriptive metadata necessary to interpret the content. Technical representational information of the content can also be supplemented by “structMap” for structural information and “fileSec” for information on the files. The PDI is included in elements such as “Event”, “Agent”, “rightsMD”, and “digiprovMD”, which document PDI aspects and the context of the object. The descriptive information is covered by “dmdSec”, which contains descriptive metadata that describe the digital object at a content level. The packaging information is not necessary as METS itself serves as the packaging information, as it brings together all metadata and content information in a structured and standardized format.

3.2. Data Compression and Data Decompression

For handling ZIP folders during export or import, a service needs to be implemented using existing PHP functionality to enable the writing of SIPs and the reading of DIPs. To enable the long-term preservation of video recordings, the use of the FFmpeg framework [23] will be designed for the conversion between the MP4 and MKV formats and for conversion to the H.264 video codec. The modification of video files by converting them to grayscale and removing the audio track to reduce the file size will also be modeled using FFmpeg. The pseudo-code snippet in Figure 3 shows a method to handle video conversion using the FFmpeg framework to convert between MP4 (line 18, -f mp4) and MKV (line 26, -f matroska) formats, remove the audio track (line 14 and 22, -an), change the picture to grayscale (line 7, -vf format=gray), and convert the video recordings to the H.264 (line 12, -c:v libx264) or FFV1 (line 20, -c:v ffv1) video codec to make them suitable for long-term preservation. By modeling the pseudo-code, RC2.1–2.5 were addressed.

3.3. Hash Values and Digital Signatures

Addressing RC3.1 and RC3.2 to ensure data integrity, mechanisms need to be modeled that provide the calculation and storage of hash values for each file during export. These hash values need to be stored in the metadata XML for each file so that they can be used for integrity checks during import. PHP Services will manage the calculation of the hash values during export and the validation of the hash values during import to ensure the integrity of the files. In the context of this paper, no deliberate manipulation of the metadata XML file is considered.
Addressing RC3.3 and RC3.4 to ensure data authenticity, mechanisms need to be modeled that generate digital signatures for each exported file. These signatures will also be stored in the metadata file and verified during import to ensure the authenticity of the files. The implementation will also be placed at PHP Services to manage the generation of private keys as well as digital signatures by using the key and hash value. In PHP, the “openssl_pkey_new” method of the OpenSSL extension can be used to generate the private key [24]. The “openssl_sign” method of the OpenSSL extension can be used to create the digital signature [24]. For the import, the “openssl_verify” method can be used in PHP to verify a digital signature of a transferred file using a public key during import [24].
Addressing RC3.5 to enable the user to export and import Emotion Analyses, a user interface needs to be designed. This includes options for video conversion, signing, and compression, as well as an overview for validating hash values and signatures during import. The user needs to interact with the user interface to progress through the step-by-step configuration selection and the verification of data integrity and authenticity.
For the export, a KM-EP user-interface dialog is needed to support step-by-step configuration. First, there will be interaction possibilities required to convert the video to either MP4 with H.264 video codec or to MKV with FFV1 video codec, and to remove the audio track or migrate the video to grayscale if needed. Also, the user should be able to either upload or generate a private key for the digital signature and to specify the ZIP folder’s name. After completing these steps, the export should start by clicking an export button.
When importing the ZIP folder to the system, a KM-EP user-interface dialog for uploading and validating the files is needed. The user must select which ZIP folder should be uploaded and the validation will start automatically. During the validation process, the XSD and XML files for the database entities will be validated followed by checks for hash values and signatures. If the validation was successful, the EDP should be reimported by clicking an import button.

4. Implementing a Software Prototype for Long-Term Preservation

This section will introduce the results of the phase system development according to the methodological framework by Nunamaker et al. [9]. The models of the previous section will be implemented to provide a software prototype addressing the research questions and remaining challenges identified in the second section.
Following the models and designs of the third section, the KM-EP system will be extended to support the long-term preservation of Emotion Analyses. In this chapter, the extension of the system according to the metadata extraction, file conversion, and archival packaging will be introduced. Therefore, services like file handling, data handling, metadata functionality, and XML and XSD generation are implemented.
A utility service offers supporting functionalities, like loading and validating JSON configuration files, UUID generation, file type recognition, and providing default names for files. The data handling service reads the data and their structure from the database but also writes the database entities in the context of an import. The XSD service provides functionalities to generate XSD based on JSON configuration files or exported entities. The XML service is used for the dynamic generation of XML files from database entities and their relations based on the JSON configuration file but also to validate the XML files with a given XSD file. The metadata service enables the dynamic creation of METS/PREMIS-compliant XML metadata files that map complex information on file structures and data relationships. The file handling service provides reading, writing, archiving, and extracting files as well as inserting metadata without performing detailed validation or conversion of the files themselves.
To support video conversion, a service is implemented which supports the conversion to MP4 or MKV and the optional removal of audio tracks or conversion to grayscale to reduce the file size using the FFmpeg framework [23], as shown in Figure 3.
For packing and unpacking the files to a ZIP folder and building a structure inside that folder, the ZIP service is implemented.
To provide functionalities for hash values and digital signatures, the crypto service is implemented. It offers methods to generate public and private keys, sign files using hash values and the key, calculate hash values, and verify the signature by checking if the signature matches a public key and a hash value to verify the authenticity of the files.
Using these services, controllers were implemented to support the import and export via the user interface. The export controller provides functionality to find the video recording, generate and validate a private key, and create the ZIP folder. The final KM-EP user-interface export dialog shown in Figure 4 enables the user to display information about the original video file and select options for video conversion. Experts can also enable video modification options, such as converting to grayscale or removing the audio track. In the signature management section, a private key can either be uploaded or generated to ensure the integrity of the exported file. Finally, the form allows you to enter a name for the ZIP file and offers the option to start the export process.
The import controller is needed to manage the upload of the ZIP folder which is carried out in file chunks to support larger files and store them temporarily. It also provides ZIP validation for the XML files against the XSD scheme file, extracts metadata as well as other relevant information for data consistency, and converts files to their original type if necessary. On the import user interface shown in Figure 5, users can select the ZIP file of the emotion-data package. Once selected, the relevant information on the analysis series is displayed, including the title, experiment category, description, and associated video recording. Below this is an overview table that lists all the files contained in the ZIP file. This table shows the file path, the file name, and the status of the hash validation and signature check. These elements are used to check and confirm the integrity and authenticity of the imported files. The import process is started by clicking on the “Import” button, which creates an analysis series in the system, including all associated objects and file attachments.
By implementing these two user interfaces, the user can access both use cases mentioned in section three and export EDPs using a step-by-step configuration to match the context and needs of the specific preservation.

5. Evaluation

This section will assess the implemented software prototype introduced in the previous section to achieve the goals of the phase experimentation of the multimethodological approach by Nunamaker et al. [9], and verify the functionalities and user interfaces.

5.1. Evaluation Using Functional Tests

Based on [25], functional tests are used to check whether software fulfills the specified requirements and functions as intended. The focus here is on validating the functionality of the system, regardless of its internal implementation (so-called black box approach) [25]. Functional tests comprise various levels, such as unit tests, integration tests, system tests, and acceptance tests, which together are intended to ensure that all user requirements are met [25]. The test cases focus on the conformity of the data export and import processes with the defined requirements and on the consistency of the generated metadata and associated entities.
In the first test case, the functionality of exporting an emotion analysis from KM-EP is assessed. The criteria to be evaluated and the steps required for evaluation were defined in advance. The test was conducted step by step: First, after the export, the contents of the ZIP archive were checked for the folder structure to ensure conformity with the modeling. The DBData.xml and DBData.xsd files were then examined. The focus here was on the completeness and correctness of the database information and the associated entities. Finally, the system-generated Metadata.xml file, which was exported to the ZIP folder, was analyzed. To evaluate metadata quality, a completeness check was performed by comparing the generated metadata fields against OAIS and PREMIS requirements for content information, preservation description information (PDI), and fixity. The implemented metadata achieved full coverage of core OAIS entities—content, provenance, context, and fixity—and partial coverage of access rights and representational information. Overall, the test confirmed the conformity of the modeling and the successful implementation of the tested (partial) prototypes.
In the next test case, the previously exported ZIP folder is reimported. The aim of this test is to validate the import process. During the import process, the first step is to validate the database data contained in the DBData.xml file using the schema definition. In addition, the hash values of the file attachments are checked, and the digital signatures are verified. Finally, the analysis series is created together with the associated entities and file attachments. The import process was conducted step by step, starting with the validation of the DBData.xml file. As soon as the ZIP folder was selected and the import process started, the database data contained in the DBData.xml was checked against the associated schema definition. The results of the validation as well as the information on the file attachments and the associated entities are displayed to the user. After clicking the import button, the new data are created in the system. The test confirmed that reimported packages restored all entity relationships, metadata values, and file references without loss. This demonstrates that archived packages can serve as self-contained, reusable information objects suitable for long-term preservation and reanalysis.
Regarding the conversion of the video recordings, it was investigated how the selection of different video compression methods, including the choice of format, the conversion to grayscale, and the removal of the audio track, affects the file size. The analysis of file sizes in Table 1 shows that the choice of format and the use of modifications can contribute significantly to reducing the file size. MKV files are significantly larger than MP4 files. Without modifications, the MKV file is around forty-five times larger than the corresponding MP4 file, making MP4 the more memory-efficient choice.
Switching to grayscale leads to a significant reduction in file size in both formats. While the saving for MP4 is around 7–8%, it is up to 24% for MKV, making grayscale particularly effective for larger files. Removing the audio track also reduces the file size, by around 11% for MP4 and 5–7% for MKV. The combination of both modifications maximizes the savings and results in a reduction of up to 19% for MP4 and up to 28% for MKV. Overall, modifications such as grayscale and audio track removal enable significant storage space savings, especially for large files in MKV format. However, choosing the MP4 format remains the most efficient option for minimizing file size, regardless of the modifications applied.
In summary, the functional tests showed that there might be performance optimizations necessary for importing large files, like for importing and converting video recordings with more than 1 GB file size. Also, there is currently no information displayed on what file size is to be expected when exporting a ZIP folder to make the export more predictable for the user.

5.2. Evaluation Using Cognitive Walkthrough

Cognitive walkthrough is a specific method of usability analysis that focuses on the learnability of a product. The approach assumes that users often understand systems through independent exploration rather than using formal training programs. The planning of a cognitive walkthrough involves preparation to ensure that the method is implemented effectively and systematically. To begin the walkthrough, the target users were defined, the tasks were formulated, the process was defined, and the expected steps and results to be achieved were determined [26].
The user interface was provided, and the results were evaluated regarding the following points for each action of the task sequences as defined by C.Wilson [26]:
Will the user attempt to perform the correct action?
Will the user recognize that the correct action is available?
Does the user associate the correct action with the desired effect?
Does the user recognize that progress has been made after the correct action?
A “no” to any of the questions indicates a potential usability problem. In the follow-up of a cognitive walkthrough, the results are analyzed, solutions for identified usability problems are discussed, and their implementation is planned. Finally, the walkthrough process itself is evaluated to identify improvements for future applications [26].
The results of the cognitive walkthrough are summarized in Table 2, where users had to first export Emotion Analyses using video conversion to MP4, grayscale, and the H.264 video codec, as well as generating a private key. The second task was the reimport of the ZIP folder, ensuring that the data were successfully imported to the system.
Although the necessary action steps were recognized, the test subjects still made suggestions for improvements that can be considered as potential enhancements to the user interface. This includes, for example, an input field to change the name of the emotion analysis during import.
In summary, the evaluation shows that an OAIS-compliant export is possible and has been successfully implemented. The user interface can still be optimized for user-friendliness and performance while importing.

6. Discussion and Outlook

This paper contributes to Research Question 3 of the research project [6] by providing a software prototype in KM-EP with a dashboard for long-term archiving EDPs consisting of video recordings, emotion sequence data, and session analysis data and documents for human–human dialogs. The software prototype provides an export with structured metadata and processing information to comply with the OAIS Reference Model Framework that also enables the reimport for further analysis of already archived data. Therefore, the paper explored the OAIS principles to be implemented for ensuring long-term preservation. The metadata structure was introduced using METS and PREMIS to supply the necessary information for long-term preservation and future accessibility. In the metadata file, hash values and digital signatures were added to ensure data integrity and authenticity which are verified during import. To improve the file size of exported packages, options for video modification are available in the export user-interface in KM-EP as these are the largest files to be exported and text files cannot be compressed without information loss.
The prototype was assessed using functional tests and cognitive walkthrough with two probands. The evaluation showed that the performance while converting large video files might need to be improved. Also, there should be information displayed on what file size is to be expected when exporting EDPs to make the exports more predictable. There may also be more configuration steps implemented in the future, like changing the name of Emotion Analyses during exports.
In the future, further extensions are also conceivable. For example, data sensitivity issues when archiving medical data might need to be addressed. As already mentioned, not all attributes and aspects of PDI and representational information were written while exporting through the software prototype. Though it is extensible, there might be further investigation necessary to decide which metadata fields should be added for long-term preservation and correct interpretation of the archived information through the designated community. In aspects of data integrity and authenticity, the manipulation of the metadata file was not considered, and future works should address this, to avoid importing invalid data. Despite these points, the paper shows an applicable approach for the long-term preservation of EDPs using the OAIS Reference Model Framework.

Author Contributions

Conceptualization, V.S. and M.P.; methodology, M.H. and M.X.B.; validation, V.S.; investigation, V.S. and M.P.; writing—original draft preparation, V.S.; writing—review and editing, V.S., M.H., and M.X.B.; visualization, V.S. and M.P.; supervision, M.H. and M.X.B.; project administration, V.S. and M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AIPArchival Information Package
DIPDissemination Information Package
EDPEmotion-Data Package
KM-EPKnowledge Management Ecosystem Portal
MKVMatroska Container
OAISOpen Archival Information System
PDI Preservation Description Information
RCRemaining Challenge
RQResearch Question
SIPSubmission Information Package
UCDUser-Centered Design

References

  1. Der Wissenschaftliche Beirat Beim Bundesministerium für Wirtschaft und Energie r Wissenschaftliche Beirat Beim Bundesministerium für Wirtschaft und Energie. Digitalisierung in Deutschland—Lehren aus der Corona-Krise. Bundesministerium für Wirtschaft und Energie (BMWi) Öffentlichkeitsarbeit, March 2021. Available online: https://www.bmwk.de/Redaktion/DE/Publikationen/Ministerium/Veroeffentlichung-Wissenschaftlicher-Beirat/gutachten-digitalisierung-in-deutschland.pdf (accessed on 26 October 2025).
  2. Bitkom eigentragener Verein (e.V.). Corona Beschleunigt die Digitalisierung der Medizin—Mit Unterschiedlichem Tempo. Available online: https://www.bitkom.org/Presse/Presseinformation/Corona-beschleunigt-die-Digitalisierung-der-Medizin-mit-unterschiedlichem-Tempo (accessed on 26 October 2025).
  3. Maier, D.; Hemmje, M.; Kikic, Z.; Wefers, F. Real-Time Emotion Recognition in Online Video Conferences for Medical Consultations. In International Conference on Machine Learning, Optimization, and Data Science; Springer Nature: Cham, Switzerland, 2024; pp. 479–487. [Google Scholar] [CrossRef]
  4. ISO 14721:2025; Space Data System Practices—Reference model for an open archival information system (OAIS). 2025. Available online: https://www.iso.org/standard/87471.html (accessed on 30 October 2025).
  5. The Consultative Committee for Space Data Systems. Reference Model for an Open Archival Information System (OAIS); MAGENTA BOOK; CCSDS Secretariat National Aeronautics and Space Administration: Washington, DC, USA, 2024; Volume CCSDS 650.0-M-3. [Google Scholar]
  6. Schreyer, V.; Bornschlegl, M.X.; Hemmje, M. Toward Annotation, Visualization, and Reproducible Archiving of Human–Human Dialog Video Recording Applications. Information 2025, 16, 349. [Google Scholar] [CrossRef]
  7. Salomon, D.; Motta, G. Handbook of Data Compression; Springer: London, UK, 2010. [Google Scholar] [CrossRef]
  8. Sayood, K. Introduction to Data Compression (The Morgan Kaufmann Series in Multimedia Information and Systems), 3rd ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2006. [Google Scholar]
  9. Nunamaker, J.F.; Chen, M.; Purdin, T.D.M. Systems Development in Information Systems Research. J. Manag. Inf. Syst. 1990, 7, 89–106. [Google Scholar] [CrossRef]
  10. Dappert, A.; Guenther, R.S.; Peyrard, S. Digital Preservation Metadata for Practitioners; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
  11. Gartner, R.; Lavoie, B. Preservation Metadata, 2nd ed.; Charles Beagrie Ltd.: Salisbury, UK, 2013. [Google Scholar] [CrossRef]
  12. Congress, T.L.O. METS: An Overview & Tutorial. Available online: https://www.loc.gov/standards/mets/METSOverview.html (accessed on 26 October 2025).
  13. Manz, O. Gut Gepackt—Kein Bit zu Viel; Springer Fachmedien: Wiesbaden, Germany, 2020. [Google Scholar] [CrossRef]
  14. Li, Z.-N.; Drew, M.S.; Liu, J. Fundamentals of Multimedia; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  15. Bunkus, M. What Is Matroska? Available online: https://www.matroska.org/what_is_matroska.html (accessed on 21 August 2025).
  16. Trac. FFV1 Encoding Cheatsheet. Available online: https://trac.ffmpeg.org/wiki/Encode/FFV1 (accessed on 21 August 2025).
  17. Jones, E. Data Accuracy vs. Data Integrity: Similarities and Differences. Available online: https://www.ibm.com/think/topics/data-accuracy-vs-data-integrity (accessed on 21 August 2025).
  18. Mittelbach, A.; Fischlin, M. The Theory of Hash Functions and Random Oracles; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  19. Falcetta, F.S.; de Almeida, F.K.; Lemos, J.C.S.; Goldim, J.R.; da Costa, C.A. Automatic documentation of professional health interactions: A systematic review. Artif. Intell. Med. 2023, 137, 102487. [Google Scholar] [CrossRef] [PubMed]
  20. Clemen, J.M.B.; Teleron, J.I. Advancements in Encryption Techniques for Secure Data Communication. Int. J. Adv. Res. Sci. Commun. Technol. 2023, 3, 444–451. [Google Scholar] [CrossRef]
  21. Easttom, W. Modern Cryptography; Springer International Publishing: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
  22. PREMIS Editorial Committee. PREMIS Data Dictionary for Preservation Metadata, Vol. 3.0. 2015. Available online: https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf (accessed on 21 August 2025).
  23. Ffmpeg. ffmpeg Documentation. Available online: https://ffmpeg.org/ffmpeg.html (accessed on 21 August 2025).
  24. The PHP Documentation Group. OpenSSL. Available online: https://www.php.net/manual/de/book.openssl.php (accessed on 21 August 2025).
  25. Leloudas, P. Introduction to Software Testing; Apress: Berkeley, CA, USA, 2023. [Google Scholar] [CrossRef]
  26. Wilson, C. Cognitive Walkthrough. In User Interface Inspection Methods; Elsevier: Amsterdam, The Netherlands, 2014; pp. 65–79. [Google Scholar] [CrossRef]
Figure 1. JSON model for configuration of content information.
Figure 1. JSON model for configuration of content information.
Information 16 00951 g001
Figure 2. Metadata structure using METS and PREMIS components.
Figure 2. Metadata structure using METS and PREMIS components.
Information 16 00951 g002
Figure 3. Pseudo-code for video conversion.
Figure 3. Pseudo-code for video conversion.
Information 16 00951 g003
Figure 4. Screenshot of the implemented export user-interface in KM-EP.
Figure 4. Screenshot of the implemented export user-interface in KM-EP.
Information 16 00951 g004
Figure 5. Screenshot of the implemented import user-interface in KM-EP.
Figure 5. Screenshot of the implemented import user-interface in KM-EP.
Information 16 00951 g005
Table 1. Effects of video conversion options on file size.
Table 1. Effects of video conversion options on file size.
FormatWith Audio TrackGrayscaleFile Size
MP431.4 MB (32,970,323 Bytes)
MP429.2 MB (30,662,751 Bytes)
MP427.8 MB (29,228,090 Bytes)
MP425.6 MB (26,920,474 Bytes)
MKV1.41 GB (1,522,372,157 Bytes)
MKV1.13 GB (1,222,986,897 Bytes)
MKV1.34 GB (1,441,165,178 Bytes)
MKV1.06 GB (1,141,779,917 Bytes)
Table 2. Cognitive walkthrough results.
Table 2. Cognitive walkthrough results.
TaskRequired StepsSuccessful StepsProblem Rate
Export the ZIP folder880%
Reimport the ZIP folder440%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Schreyer, V.; Pfalzgraf, M.; Bornschlegl, M.X.; Hemmje, M. Long-Term Preservation of Emotion Analyses with OAIS: A Software Prototype Design Approach for Information Package Conversion in KM-EP. Information 2025, 16, 951. https://doi.org/10.3390/info16110951

AMA Style

Schreyer V, Pfalzgraf M, Bornschlegl MX, Hemmje M. Long-Term Preservation of Emotion Analyses with OAIS: A Software Prototype Design Approach for Information Package Conversion in KM-EP. Information. 2025; 16(11):951. https://doi.org/10.3390/info16110951

Chicago/Turabian Style

Schreyer, Verena, Michael Pfalzgraf, Marco Xaver Bornschlegl, and Matthias Hemmje. 2025. "Long-Term Preservation of Emotion Analyses with OAIS: A Software Prototype Design Approach for Information Package Conversion in KM-EP" Information 16, no. 11: 951. https://doi.org/10.3390/info16110951

APA Style

Schreyer, V., Pfalzgraf, M., Bornschlegl, M. X., & Hemmje, M. (2025). Long-Term Preservation of Emotion Analyses with OAIS: A Software Prototype Design Approach for Information Package Conversion in KM-EP. Information, 16(11), 951. https://doi.org/10.3390/info16110951

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop