A Prototype Web Application to Support Human-Centered Audiovisual Content Authentication and Crowdsourcing
Abstract
:1. Introduction
1.1. Related Work
1.2. Project Motivation and Research Objectives
2. Materials and Methods
2.1. A Web Application for Audio Tampering Detection and Crowdsourcing
- Implements state-of-the-art analysis options. An ensemble of algorithms is incorporated, addressing multiple audio tampering strategies. Such strategies may include encoding detection, recording conditions, background noise clustering, and others.
- Follows a modular approach. The algorithms that are provided in the initial implementation are available as individual modules. This allows the existing algorithms to be upgraded in the future, as well as the extension of the initially provided toolbox.
- Supports human-centered decision-making. As was explained, it is within the rationale of the MaThe solutions to promote computer-assisted decision making. The algorithmic implementations provide intuitive visualizations aiming at assisting the user in content authentication, taking also into consideration the user’s personal experience and perception, as well as the context of the asset under investigation.
- Is publicly available. As was explained, the web framework aims to address a wide public. An important prerequisite for this is that it is freely available for anyone to use and contribute.
- Requires no audio or technical expertise. The design principles prioritize ease-of-use, following a typical workflow. A more experienced user with a technical and signal processing background, may get better insight and understanding of the produced visualizations. However, the detection of outliers or suspicious points in a file timeline is self-explanatory and does not require a deep understanding of the algorithms and mechanisms.
- Promotes crowdsourcing. Users and teams can become involved and contribute to the project in several ways to further advance the field of audio tampering detection. They can submit files, annotated as tampered or not tampered, with a brief justification. Users can also randomly browse files from the dataset, analyze them, and mark them as tampered or not tampered. Finally, as this is an open-source project following a modular architecture, researchers and teams are encouraged to contribute with code and extensions.
2.2. The Computer-Supported Human-Centered Approach
2.3. An Ensemble of Methods for Audio Tampering Detection
2.3.1. Common Audio Representations
2.3.2. Different Encoding Recognition
- Heavy compression to the audio file under investigation (FUI), thus creating a double-compressed file (DCF).
- A feature vector is extracted the FUI and the DCF, creating the (T × F) matrices Fi (t), where i = 1, 2, T is the number of time frames and F is the length of the feature vector.
- For every time frame, the Euclidean distance D(t) of the two matrices is calculated
- D’(t) = D(t) − D(t − 1) is calculated to show the differentiation between successive time frames.
- D’(t) is expected to present local extrema in time frames that include a transition between audio segments of different compression, indicating possible tampering points.
2.3.3. Reverberation Level Estimation
2.3.4. Silent Period Clustering
2.4. Crowdsourcing for Dataset Creation, Validation, and User Cooperation
2.5. Common Use Case Scenarios
- Scenario 1: A user submits a file and uses the toolbox to determine its authenticity.
- Scenario 2: A user is unable to decide and asks for help from the community.
- Scenario 3: A user browses the database to annotate files.
3. Results
3.1. Convolutional Neural Network Regression Model for Signal-to-Reverberation-Ratio Estimation
3.2. Implementation and Deployment of the Prototype for Human-Centered Audio Authentication Support and Crowdsourcing
4. Discussion
- A novel approach is proposed for audio tampering detection, where decision making is held by the human-in-the-loop in a computer assisted environment. This approach makes use of technical advances, surpasses their limitations and unreliability, and proposes a solution that can be immediately applied in journalistic practice.
- The solution is provided openly as a service, allowing its use by journalists and the audience, without any limitations on their equipment or platform.
- The application follows a modular approach. This means that the modules that are integrated in the prototype can be updated easily, and more modules can be added in the near future.
- A CNN model for data-driven SRR estimation to be used in the direction of audio authentication was presented and evaluated.
- A crowdsourcing approach was introduced for both user collaboration in media authentication and dataset creation and annotation. Users contribute with their effort to the extension of the dataset of tampered media files and also assist other users who request support in the authentication of specific files.
5. Limitations
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Katsaounidou, A.N.; Dimoulas, C.A. Integrating Content Authentication Support in Media Services. In Encyclopedia of Information Science and Technology, 4th ed.; IGI Global: Hershey, PA, USA, 2018; pp. 2908–2919. [Google Scholar] [CrossRef]
- Katsaounidou, A.; Dimoulas, C. The Role of media educator on the age of misinformation Crisis. In Proceedings of the EJTA Teachers’ Conference on Crisis Reporting, Thessaloniki, Greece, 18–19 October 2018. [Google Scholar]
- Katsaounidou, A.; Dimoulas, C.; Veglis, A. Cross-Media Authentication and Verification: Emerging Research and Opportunities; IGI Global: Hershey, PA, USA, 2019. [Google Scholar] [CrossRef]
- Katsaounidou, A.; Vrysis, L.; Kotsakis, R.; Dimoulas, C.; Veglis, A. MAthE the game: A serious game for education and training in news verification. Educ. Sci. 2019, 9, 155. [Google Scholar] [CrossRef] [Green Version]
- Katsaounidou, A.; Vryzas, N.; Kotsakis, R.; Dimoulas, C. Multimodal News authentication as a service: The “True News” Extension. J. Educ. Innov. Commun. 2019, 11–26. [Google Scholar] [CrossRef]
- Katsaounidou, A.; Gardikiotis, A.; Tsipas, N.; Dimoulas, C. News authentication and tampered images: Evaluating the photo-truth impact through image verification algorithms. Heliyon 2020, 6, e05808. [Google Scholar] [CrossRef] [PubMed]
- Vryzas, N.; Katsaounidou, A.; Kotsakis, R.; Dimoulas, C.A.; Kalliris, G. Investigation of audio tampering in broadcast content. In Proceedings of the Audio Engineering Society Convention 144, Milan, Italy, 23–26 May 2018. [Google Scholar]
- Vryzas, N.; Katsaounidou, A.; Kotsakis, R.; Dimoulas, C.A.; Kalliris, G. Audio-driven multimedia content authentication as a service. In Proceedings of the Audio Engineering Society Convention 146, Dublin, Ireland, 20–23 March 2019. [Google Scholar]
- Bakker, P. New journalism 3.0—Aggregation, content farms, and Huffinization: The rise of low-pay and no-pay journalism. In Proceedings of the Future of Journalism Conference, Cardiff, UK, 8–9 September 2011. [Google Scholar]
- Graves, L.; Cherubini, F. The Rise of Fact-Checking Sites in Europe; Reuters Institute for the Study of Journalism: Oxford, UK, 2016. [Google Scholar]
- Bakir, V.; McStay, A. Fake news and the economy of emotions: Problems, causes, solutions. Digit. Journal. 2018, 6, 154–175. [Google Scholar] [CrossRef]
- Verma, J.P.; Agrawal, S.; Patel, B.; Patel, A. Big data analytics: Challenges and applications for text, audio, video, and social media data”. Int. J. Soft Comput. Artif. Intell. Appl. 2016, 5, 41–51. [Google Scholar] [CrossRef]
- Vlachos, A.; Riedel, S. Fact checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, MD, USA, 26 June 2014; pp. 18–22. [Google Scholar]
- Zampoglou, M.; Papadopoulos, S.; Kompatsiaris, Y. Large-scale evaluation of splicing localization algorithms for web images. Multimed. Tools Appl. 2017, 76, 4801–4834. [Google Scholar] [CrossRef]
- Zampoglou, M.; Papadopoulos, S.; Kompatsiaris, Y. Detecting image splicing in the wild (web). In Proceedings of the 2015 IEEE International Conference on Multimedia & Expo Workshops, Turin, Italy, 29 June–3 July 2015; pp. 1–6. [Google Scholar]
- Sitara, K.; Mehtre, B.M. Digital video tampering detection: An overview of passive techniques. Digit. Investig. 2016, 18, 8–22. [Google Scholar] [CrossRef]
- Grigoras, C.; Smith, J.M. Audio Enhancement and Authentication. In Encyclopedia of Forensic Sciences; Elsevier: Amsterdam, The Netherlands, 2013. [Google Scholar]
- Maher, R.C. Audio forensic examination. IEEE Signal Process. Mag. 2009, 26, 84–94. [Google Scholar] [CrossRef]
- Koenig, B.E. Authentication of forensic audio recordings. J. Audio Eng. Soc. 1990, 38, 3–33. [Google Scholar]
- Zakariah, M.; Khan, M.K.; Malik, H. Digital multimedia audio forensics: Past, present and future. Multimed. Tools Appl. 2018, 77, 1009–1040. [Google Scholar] [CrossRef]
- Gupta, S.; Cho, S.; Kuo, C.C.J. Current developments and future trends in audio authentication. IEEE Multimed. 2011, 19, 50–59. [Google Scholar] [CrossRef]
- Rodríguez, D.P.N.; Apolinário, J.A.; Biscainho, L.W.P. Audio authenticity: Detecting ENF discontinuity with high precision phase analysis. IEEE Trans. Inf. Forensics Secur. 2010, 5, 534–543. [Google Scholar] [CrossRef]
- Grigoras, C. Applications of ENF analysis in forensic authentication of digital audio and video recordings. J. Audio Eng. Soc. 2009, 57, 643–661. [Google Scholar]
- Brixen, E.B. Techniques for the authentication of digital audio recordings. In Proceedings of the Audio Engineering Society Convention 122, Vienna, Austria, 5–8 May 2007. [Google Scholar]
- Hua, G.; Zhang, Y.; Goh, J.; Thing, V.L. Audio authentication by exploring the absolute-error-map of ENF signals. IEEE Trans. Inf. Forensics Secur. 2016, 11, 1003–1016. [Google Scholar] [CrossRef]
- Malik, H.; Farid, H. Audio forensics from acoustic reverberation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 1710–1713. [Google Scholar]
- Zhao, H.; Malik, H. Audio recording location identification using acoustic environment signature. IEEE Trans. Inf. Forensics Secur. 2013, 8, 1746–1759. [Google Scholar] [CrossRef]
- Buchholz, R.; Kraetzer, C.; Dittmann, J. Microphone classification using Fourier coefficients. In Proceedings of the International Workshop on Information Hiding, Darmstadt, Germany, 8–10 June 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 235–246. [Google Scholar]
- Garcia-Romero, D.; Espy-Wilson, C.Y. Automatic acquisition device identification from speech recordings. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 1806–1809. [Google Scholar]
- Hafeez, A.; Malik, H.; Mahmood, K. Performance of blind microphone recognition algorithms in the presence of anti-forensic attacks. In Proceedings of the 2017 AES International Conference on Audio Forensics, Arlington, VA, USA, 15–17 June 2017. [Google Scholar]
- Malik, H. Acoustic environment identification and its applications to audio forensics. IEEE Trans. Inf. Forensics Secur. 2013, 8, 1827–1837. [Google Scholar] [CrossRef]
- Narkhede, M.; Patole, R. Acoustic scene identification for audio authentication. In Soft Computing and Signal Processing; Springer: Singapore, 2019; pp. 593–602. [Google Scholar]
- Patole, R.K.; Rege, P.P.; Suryawanshi, P. Acoustic environment identification using blind de-reverberation. In Proceedings of the 2016 International Conference on Computing, Analytics and Security Trends (CAST), Pune, India, 19–21 December 2016; pp. 495–500. [Google Scholar]
- Qiao, M.; Sung, A.H.; Liu, Q. MP3 audio steganalysis. Inf. Sci. 2013, 231, 123–134. [Google Scholar] [CrossRef]
- Yang, R.; Shi, Y.Q.; Huang, J. Detecting double compression of audio signal. In Media Forensics and Security II, Proceedings of the IS&T/SPIE Electronic Imaging, San Jose, CA, USA, 17–21 January 2010; SPIE: Bellingham, WA, USA, 2010; Volume 7541, p. 75410K. [Google Scholar]
- Liu, Q.; Sung, A.H.; Qiao, M. Detection of double MP3 compression. Cogn. Comput. 2010, 2, 291–296. [Google Scholar] [CrossRef]
- Seichter, D.; Cuccovillo, L.; Aichroth, P. AAC encoding detection and bitrate estimation using a convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 2069–2073. [Google Scholar]
- Lacroix, J.; Prime, Y.; Remy, A.; Derrien, O. Lossless audio checker: A software for the detection of upscaling, upsampling, and transcoding in lossless musical tracks. In Proceedings of the Audio Engineering Society Convention 139, New York, NY, USA, 29 October–1 November 2015. [Google Scholar]
- Gärtner, D.; Dittmar, C.; Aichroth, P.; Cuccovillo, L.; Mann, S.; Schuller, G. Efficient cross-codec framing grid analysis for audio tampering detection. In Proceedings of the Audio Engineering Society Convention 136, Berlin, Germany, 26–29 April 2014. [Google Scholar]
- Hennequin, R.; Royo-Letelier, J.; Moussallam, M. Codec independent lossy audio compression detection. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 726–730. [Google Scholar]
- Luo, D.; Yang, R.; Huang, J. Identification of AMR decompressed audio. Digit. Signal Process. 2015, 37, 85–91. [Google Scholar] [CrossRef]
- Maung, A.P.M.; Tew, Y.; Wong, K. Authentication of mp4 file by perceptual hash and data hiding. Malays. J. Comput. Sci. 2019, 32, 304–314. [Google Scholar] [CrossRef] [Green Version]
- Khan, M.K.; Zakariah, M.; Malik, H.; Choo, K.K.R. A novel audio forensic data-set for digital multimedia forensics. Aust. J. Forensic Sci. 2018, 50, 525–542. [Google Scholar] [CrossRef]
- Gärtner, D.; Cuccovillo, L.; Mann, S.; Aichroth, P. A multi-codec audio dataset for codec analysis and tampering detection. In Proceedings of the Audio Engineering Society Conference: 54th International Conference: Audio Forensics: Techniques, Technologies and Practice, London, UK, 12–14 June 2014. [Google Scholar]
- Imran, M.; Ali, Z.; Bakhsh, S.T.; Akram, S. Blind detection of copy-move forgery in digital audio forensics. IEEE Access 2017, 5, 12843–12855. [Google Scholar] [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
- Vryzas, N. Audiovisual Stream Analysis and Management Automation in Digital Media and Mediated Communication. Ph.D. Dissertation, Aristotle University of Thessaloniki, Thessaloniki, Greece, 2020. [Google Scholar]
- Vrysis, L.; Tsipas, N.; Thoidis, I.; Dimoulas, C. 1D/2D Deep CNNs vs. Temporal Feature Integration for General Audio Classification. J. Audio Eng. Soc. 2020, 68, 66–77. [Google Scholar] [CrossRef]
- Brabham, D.C. Crowdsourcing; MIT Press: Cambridge, MA, USA, 2013. [Google Scholar]
- Vrysis, L.; Hadjileontiadis, L.; Thoidis, I.; Dimoulas, C.; Papanikolaou, G. Enhanced Temporal Feature Integration in Audio Semantics via Alpha-Stable Modeling. J. Audio Eng. Soc. 2021, 69, 227–237. [Google Scholar] [CrossRef]
- Bountourakis, V.; Vrysis, L.; Konstantoudakis, K.; Vryzas, N. An enhanced temporal feature integration method for environmental sound recognition. Acoustics 2019, 1, 410–422. [Google Scholar] [CrossRef] [Green Version]
- Vrysis, L.; Thoidis, I.; Dimoulas, C.; Papanikolaou, G. Experimenting with 1D CNN Architectures for Generic Audio Classification. In Proceedings of the Audio Engineering Society Convention 148, Vienna, Austria, 2–5 June 2020. [Google Scholar]
- Vrysis, L.; Vryzas, N.; Sidiropoulos, E.; Avraam, E.; Dimoulas, C.A. jReporter: A Smart Voice-Recording Mobile Application. In Proceedings of the Audio Engineering Society Convention 146, Dublin, Ireland, 20–23 March 2019. [Google Scholar]
- Korvel, G.; Treigys, P.; Tamulevicus, G.; Bernataviciene, J.; Kostek, B. Analysis of 2d feature spaces for deep learning-based speech recognition. J. Audio Eng. Soc. 2018, 66, 1072–1081. [Google Scholar] [CrossRef]
- Ciaburro, G. Sound event detection in underground parking garage using convolutional neural network. Big Data Cogn. Comput. 2020, 4, 20. [Google Scholar] [CrossRef]
- Ciaburro, G.; Iannace, G. Improving smart cities safety using sound events detection based on deep neural network algorithms. Informatics 2020, 7, 23. [Google Scholar] [CrossRef]
- Estellés-Arolas, E.; González-Ladrón-de-Guevara, F. Towards an integrated crowdsourcing definition. J. Inf. Sci. 2012, 38, 189–200. [Google Scholar] [CrossRef] [Green Version]
- Vrysis, L.; Tsipas, N.; Dimoulas, C.; Papanikolaou, G. Crowdsourcing audio semantics by means of hybrid bimodal segmentation with hierarchical classification. J. Audio Eng. Soc. 2016, 64, 1042–1054. [Google Scholar] [CrossRef]
- Vrysis, L.; Tsipas, N.; Dimoulas, C.; Papanikolaou, G. Mobile audio intelligence: From real time segmentation to crowd sourced semantics. In Proceedings of the Audio Mostly 2015 on Interaction with Sound, Thessaloniki, Greece, 7–9 October 2015; pp. 1–6. [Google Scholar]
- Cartwright, M.; Dove, G.; Méndez Méndez, A.E.; Bello, J.P.; Nov, O. Crowdsourcing multi-label audio annotation tasks with citizen scientists. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK, 4–9 May 2019; pp. 1–11. [Google Scholar]
- Vrysis, L.; Vryzas, N.; Kotsakis, R.; Saridou, T.; Matsiola, M.; Veglis, A.; Arcila-Calderón, C.; Dimoulas, C. A Web Interface for Analyzing Hate Speech. Future Internet 2021, 13, 80. [Google Scholar] [CrossRef]
- Chollet, F.; Eldeeb, A.; Bursztein, E.; Jin, H.; Watson, M.; Zhu, Q.S. Keras; (v.2.4.3); GitHub: San Francisco, CA, USA, 2015; Available online: https://github.com/fchollet/keras (accessed on 26 January 2022).
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in Python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; Volume 8, pp. 18–25. [Google Scholar]
- Collette, A. Python and HDF5: Unlocking Scientific Data; O’Reilly Media, Inc.: Newton, MA, USA, 2013. [Google Scholar]
Signal (%) | 100 | 90 | 80 | 70 | 60 | 50 | 40 | 30 | 20 | 10 | 0 |
Reverberation (%) | 0 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
Layer | Type | Configuration |
---|---|---|
1 | Convolutional 2D Layer | 16 filters Kernel size = (3,3) Strides = (1,1) |
2 | Max Pooling 2D Layer | Pool size = (2,2) |
3 | Dropout | Rate = 0.25 |
4 | Convolutional 2D Layer | 32 filters Kernel size = (3,3) Strides = (1,1) |
5 | Max Pooling 2D Layer | Pool size = (2,2) |
6 | Dropout | Rate = 0.25 |
7 | Convolutional 2D Layer | 64 filters Kernel size = (3,3) Strides = (1,1) |
8 | Dropout | Rate = 0.25 |
9 | Convolutional 2D Layer | 128 filters Kernel size = (3,3) Strides = (1,1) |
10 | Convolutional 2D Layer | 256 filters Kernel size = (3,3) Strides = (1,1) |
11 | Flatten Layer | |
12 | Dense Neural Network | Output weights = 64 Activation = ReLU L2 regularizer |
13 | Dense Neural Network | Output weights = 64 Activation = ReLU |
14 | Dropout | Rate = 0.25 |
15 | Dense Neural Network | Output weights = 24 Activation = Linear |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vryzas, N.; Katsaounidou, A.; Vrysis, L.; Kotsakis, R.; Dimoulas, C. A Prototype Web Application to Support Human-Centered Audiovisual Content Authentication and Crowdsourcing. Future Internet 2022, 14, 75. https://doi.org/10.3390/fi14030075
Vryzas N, Katsaounidou A, Vrysis L, Kotsakis R, Dimoulas C. A Prototype Web Application to Support Human-Centered Audiovisual Content Authentication and Crowdsourcing. Future Internet. 2022; 14(3):75. https://doi.org/10.3390/fi14030075
Chicago/Turabian StyleVryzas, Nikolaos, Anastasia Katsaounidou, Lazaros Vrysis, Rigas Kotsakis, and Charalampos Dimoulas. 2022. "A Prototype Web Application to Support Human-Centered Audiovisual Content Authentication and Crowdsourcing" Future Internet 14, no. 3: 75. https://doi.org/10.3390/fi14030075
APA StyleVryzas, N., Katsaounidou, A., Vrysis, L., Kotsakis, R., & Dimoulas, C. (2022). A Prototype Web Application to Support Human-Centered Audiovisual Content Authentication and Crowdsourcing. Future Internet, 14(3), 75. https://doi.org/10.3390/fi14030075