1. Introduction
Information and communications technology (ICT) has taken over a substantial part of our lives and has brought about changes in our daily lives. Furthermore, the digital information that is stored in computers and multimedia devices is increasing, in particular multimedia content such as images, audio, and video. Video is one of the most significant groups of these multimedia data. However, as asserted in [
1,
2], the proliferation and ease of falsification of this class of multimedia data present a daunting challenge to society, thus further requiring the need for an advanced file fingerprinting mechanism [
3,
4]. Highlighting this notion, Reference [
5] posited that the trustworthiness of a multimedia video is sacrosanct, the lack of a scientifically verifiable method notwithstanding. This challenge can be attributed to the complexity of editing software, which has also evolved to enable inexperienced users to manipulate the content of digital data (with little effort) with a high-quality output. As a consequence, questions regarding media authenticity are of growing significance, particularly in litigation where important decisions might be based on the reliability of the digital evidence [
6]. A proper chain of custody, as well as a chain of evidence are also required to ensure the repeatability and possible expert presentation of a digital artefact [
6,
7,
8].
The digital forensics discipline—the field saddled with the application of proven scientific methods to validate the reliability of digital artefacts [
9,
10]—has seen a steady growth in the number of professionals capable of extracting and verifying the authenticity of multimedia data. Whilst this surge has been prevalent in developed countries where digital crimes are thoroughly investigated, the same cannot be said for developing nations. This is, however, conversely related to the reality of crime in the developing nations. Digital criminals tend to leverage the availability of state-of-the-art software and criminal networks to perpetuate seemingly sophisticated multimedia-related crimes. Therefore, a surge of fake multimedia content tends to dominate the cyber-ecosystem of most developing countries without a corresponding forensic/policing capability. Furthermore, the search for “better pay” in a seemingly “privileged” discipline has led to the migration of digital forensic experts from developing nations to advanced settings. Thus, the developing nations are left with a declining ratio of forensic experts to cyber criminals. A potential approach to this decreasing ratio is the integration of automation (a drive towards the bush button approach) in the forensic investigation process.
However, the notion of automation has been observed to further present diverse challenges, as asserted in [
11], to include software and result verification challenges, the tendency of over-reliance on a tool (which could result in partial analysis), as well as the propensity to inhibit the soundness of the forensic process (given that the investigation process would be an art rather than a science, at best). These concerns can be summarized as the potential for a lack of reliability. Reliability in this regard involves both chain of custody and chain of evidence assurance. Attempts to address this automation challenge have been further asserted to provide a basis for departmental digital investigation workload reduction, to promote knowledge retention, and as a means towards forensic standardization and investigation coherence [
11,
12].
The remainder of this paper is organized as follows: In
Section 2, the background of this study, as well as related studies are given. This is followed by the method used to realize the push button forensic concept, presented in
Section 3. An implementation of the proposed approach is further presented, in
Section 4 which contains the developed push button tool. Thereafter, a discussion is given in
Section 5. The conclusion and future work of the study are given in
Section 6.
2. Background and Related Literature
Multimedia Forensics (MF) is a digital forensics sub-domain that applies scientific techniques across a variety of digital content (audio, video, photo, etc.) for electronic discovery [
13,
14]. Like computer forensics [
15], it involves the discovery of the source and/or location of multimedia data from their file metadata. Additionally, MF is tasked with the extraction of useful information for authentication and identification; for example, forgery detection, similarities between images, and the rate of accurate detection of multimedia facets. Image forensics plays a vital role in proving the authenticity and integrity of digital images by attempting to detect forgeries such as copy-move, copy-paste, region duplication, forged region, and region replacement within an image [
16]. Audio forensic analysis, on the other hand, is the process of collecting, examining, and reviewing audio recordings to extract facts that are admissible during litigation by a court of competent jurisdiction [
17]. Audio forensics has several applications that could be linked to the acoustic environment or location where the audio was recorded, the identification of speakers and audio improvements, or the actual device used to record the audio file. Similarly, video forensics aims to evaluate the authenticity and integrity of a moving image and integrated audio stream (video content) through the analysis of inherent device characteristics or processing artefacts in the video data [
18]. Basically, MF focuses on source identification and forgery detection. Whilst source identification focuses on identifying or inferring knowledge about the source of digital information, forgery detection attempts to uncover traces of falsification by assessing the authenticity of the digital content [
19]. MF is able to achieve this goal by relying on the extraction of facts and evidence to authenticate the integrity of digital data [
20]. Videos are made by converting a camera’s electrical impulses and saving the information as digital media. The number of still images per video time unit is referred to as the frame rate. Clips in digital videos use about 12–30 frame rates per second, with 24 frames per second as the widely used frame rate (frame/s). The larger the number of frames, the smoother the video will appear. MP4 video, for instance, uses a sequence of pictures (discrete pixels) that can be continuously viewed to create the impression of motion, which manipulates the persistence of the perception of the human pictorial system [
21].
These pixels can easily be represented by a number that uniquely identifies its overall value, which is easier for the computer to manipulate and store. Video falsification is a process of malicious modification of digital video content to obscure an entity or an event or change the meaning conveyed by the video; while video tampering detection aims to discover the traces of alteration and thereby evaluate the trustworthiness and integrity of the video file [
22]. Insights into related studies on video forensics are further presented in the next paragraphs.
A large volume of research has proposed techniques and methods to confirm the trustworthiness and integrity of a digital video evidence. These techniques asserts that modifying the content of a video introduces specific artefacts that could be used for the alteration detection of a given video file. Detection techniques are classified as passive (blind) and active techniques [
23]. According to [
24], the availability of low-cost electronic multimedia devices and the high level of data processing capabilities have made video forensics increasingly important. Nevertheless, Reference [
24] focused on discrepancies in video content using a human pictorial system through image resemblance measurement to find modifications in videos. This technique could readily detect alterations that are not noticeable to the human eye. The study in [
25] reported that the accessibility of low-cost, portable, and highly usable digital multimedia devices has significantly increased the likelihood of location-less, network-related, or time-constrained digital multimedia. As a consequence, the authentication and verification of a given content have become increasingly difficult. The study further opined that this difficulty has several consequences when the digital content is used as a corroborating piece of evidence.
Similarly, the study in [
26] proposed a video copy recognition system that is based on content fingerprinting that could also be used for the indexing and validation of video. The system uses a fingerprint extraction algorithm combined with a fast and approximate search algorithm to extract the compact content-based signatures from separate images of the video frames. Each of such images represents a short segment of the video and contains temporal, as well as spatial information about the video segment. The system extracts and pre-stores fingerprints of all the videos stored in the database. However, this approach only works for video with a very short length, thus making the approach inefficient for forensic investigation purposes. By limiting the investigation process to frame removals only, the study in [
27] proposed a collection of automated frame removal or additional recognition techniques that considered changes in the P-frame prediction error of a video. This technique focuses on video codecs using a fixed-length group of pictures (GOPs) when compressing segmented frames in a video. Moreover, the result is only reliable if anti-forensics have not been applied to the video content. Leveraging the signal processing methodology, the studies in [
25,
28] inspected the effective approaches to reconstruct and authenticate the processing history of video data. The study asserted that most alterations are not revocable and leave some “footprints” in the reconstructed signals, which can be analysed to recognize the previous processing steps. However, empirical evaluation has shown that simple processing chains of a signal can be reconstructed with a negligible amount of modification to the signal, rendering the approach inefficient to check the footprint of the video content.
In an attempt to introduce an automation process (referred to as push button forensics), the study in [
18] developed a system that explored the video stream of digital cameras and mobile phones in order to extract the file format structures. Upon successful extraction, the system then validated the structure with the original video file. Captured information included the origin of the file, recognizing the true device of the acquisition model, and the processing software that was used for the recording. Furthermore, it required an adapted file parser(s) to read and extract all obtainable file formats and metadata from the videos in the database created. This approach is a passive technique of detecting alterations in videos. The tendency to store all models or vendor-specific peculiarities of digital devices used for creating video content was a major limitation of the study. A similar study in [
22] established a method for perceiving suspicious areas using noise characteristics in static scene videos (surveillance).
A noise level function (NLF) describes the variance in image signals of the irradiance-dependent noise. The study used a probabilistic design approach, which regulated the noise characteristics at each pixel. Pixels in spliced areas were separated using the posterior maximum (MAP) estimation of the noise model where the NLFs were incompatible with the rest of the image. However, the study did not account for frame structures failure when the repeated frames were less than the calculated window size, especially when frame replication took place in a different order. Reference [
29] also developed the VidentifierTM (VTM) Forensic system for automatically recognizing the modification of images and videos. VTM Forensic has two main features that are of interest to the multimedia community. First, it has a robust structure, precisely distinguishing difficult video alterations. Secondly, it is efficient, even on a very large scale. To recognize video modifications, VTM Forensic uses a mixture of a large-scale multidimensional NV-tree index and fine-grained local image descriptors. VTM Forensic is tolerant of many pictorial changes, including mirroring, camrips, compression, and subtitles. It, however, requires that the fingerprints of the authentic versions of the videos be stored in the database for assessment. The feasibility of creating a valid database for all original versions of video files cannot be ensured during a digital investigation. These studies attempted to develop viable alternatives for video forensics, albeit with inherent limitations. Furthermore, the forensic soundness of the push button forensic modality (PBFM) tools developed was ignored. To address these observed limitations, the current research proposed a forensically sound push button forensic (PBFM) tool for the investigation of the MP4 video file format. The file format selection hinged on a limited number of potential video file formats and the possibility of an exhaustive video file format integration.
5. Discussion
The increasing rate of multimedia devices and the demand for digital data call for a scientific and forensically proven approach to verify the authenticity of digital evidence presented in video content. This is necessary because forensic analysis plays an important role in criminal investigations and civil litigation cases when administered in a court of law. Forensic practitioners have researched several techniques and methods to verify the trustworthiness and integrity of the content of a digital video [
23]. Some of the tools developed use different alteration detection techniques to authenticate digital video content, and examples of such tools include: the VidentifierTM (VTM) Forensic system for automatically identifying modification in images and videos, developed by Asmundsson and Lejsek [
29]. The method used by VTM is a fingerprint-based extraction unit associated with the database server where the geometric and time-based properties of the extracted fingerprints are stored. The system requires that the fingerprints of the authentic videos be obtained and stored in the database to be used for assessing videos under investigation. However, a priori knowledge of digital content may not be available for forensic investigation. Therefore, this proves a limitation in the VTM forensic system given that obtaining the authentic version of all available video files may prove challenging.
Secondly, the study in [
25] developed an approach to reconstruct and authenticate the processing history of video data. The study assumed that alterations are not revocable and that they leave some evidence in the reconstructed signals, which can be analysed to recognize the previous processing steps. However, according to [
28], simple processing chains of a signal can be reconstructed without adding an excessive amount of modification to the signal, thus rendering the approach inefficient to check the integrity of the video content. The study in [
27] proposed a collection of automated frame removals. This system achieved video authentication with a mathematical model of video frame removal and accumulation discovery techniques aimed at video codecs using a fixed length group of pictures (GOP) when compressing segmented video frames. The limitation of the outcome is that the method is only reliable if anti-forensic techniques have not been applied to the video content. Similarly, the study in [
18] developed a passive technique of alteration detection that explores video streams and extracts file format structures of the videos from multimedia devices. The approach is based on recognizing the true device of the acquisition model or the processing software that was used for the recording and required adapted file parser(s) to read and extract all obtainable file formats’ data and metadata from videos. Detecting video file structure information based on camera and mobile phone model specifics may not be effective in the future, because determining all models or vendor-specific peculiarities of digital devices used for creating video content is challenging.
Lastly, the study in [
30] proposed an algorithm for detecting frame deletion in HEVC-coded videos that were classified by machine learning classifiers. The research employed the passive alteration detection technique of multimedia forensic methods. Results from the study revealed that learning-based classifiers were more efficient than model-based ones. Furthermore, the system had a limitation in forgery detection capabilities when the number of deleted frames doubled the number of groups of pictures (GOPs) in digital videos. This implies that videos falsified by experienced anti-forensic individuals will lure the system to a false negative report. A descriptive summary of these tools is further presented in
Table 6.
While several tools have been developed with different alteration detection techniques, as reflected in the comparative analysis in
Table 6, the file signature method of the active authentication technique as considered in this study presents a white-box paradigm. Authentication is used loosely, in this regard, to refer to the process of identifying and validating the file signature of the given MP4 file using the baseline Hex structure as the signature. This technique is considered the most effective of all the approaches examined [
22]. It involves the process of extracting the unique digital structure of the file signature embedded at the point when video files are created. Moreover, altering the file in any manner deteriorates the embedded signature [
21]. Furthermore, the white-box paradigm ensures that the reliability of the forensic process can be evaluated. This study did not presume any prior knowledge of the authentic versions of the video files under investigation, unlike other studies that attempted to extract fingerprints from the original versions of the videos. Furthermore, this study did not rely on the architectural structures of the file formats only or the acquisition devices, because anti-forensic techniques can falsify the structure and source of digital content, as proven in previous studies.
These studies applied diverse video alteration techniques, which can be summarized as frame insertion, frame deletion, frame and header substitution, metadata alteration, and header information alteration. Furthermore, video editing tools such as Hex editor, Adobe Premiere Pro (for timeline-based video editing), Freemaker video converter, Windows movie maker, EZGif, movie maker, Corel VideoStudio Ultimate, Magix Movie Edit Pro, OpenShot (available at
https://www.openshot.org/ (accessed on 5 March 2021)), Atube catcher, Camtasia studio, and Adobe Spark are potential tools that can be explored to falsify video content. Recent advances that explore the deep learning approach in image alteration are also applicable. To further evaluate the effectiveness of the developed tool, Adobe Premiere Pro, Atube catcher, and Windows movie maker were used to perform MP4 file alteration. Three MP4 files were altered and then verified using the developed tool. The result, as shown in
Table 7, supports the theoretical supposition presented in this study.
The verification process presented in
Table 7 shows that signature mismatch can be used to distinguish altered files irrespective of the alteration techniques applied. This study therefore presented the background for a reliable approach towards a PBFM platform. Such a platform is essential to address the growing deficit of skill shortage in developing nations. It is needless to highlight that the exodus of forensic experts from most developing nations, as well as the corresponding lack of competent forensic examiners could pose a consequential challenge to the global forensics community. The proposed tool, however, provides a fundamental basis for the admissibility and reliability of forensic artefacts, more specifically, complying with the reliability assurance process stated in
Figure 1. Furthermore, there is a constant need to incorporate cost-savings mechanisms (forensic readiness) when it comes to digital forensics, which in the context of this study may be useful to an organization. This basically allows incidental planning as a solution of getting evidence when needed in order to reconstruct an event [
31,
32]. Additionally, the automation process of the tool ensures that every action taken by the user while using the tool is logged, and the resultant output of the analysis is carefully documented with a corresponding hash digest for both the logs and the analysis result. Through this, the result of the automation process can be verified by another examiner, when required, as asserted in [
11,
33].