You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

10 May 2023

Design of Enhanced Document HTML and the Reliable Electronic Document Distribution Service

and
1
Graduate School of Public Policy and Information Technology, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
2
Department of Industrial and Information System Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Feature Papers in Computer Science & Engineering

Abstract

Electronic documents are becoming increasingly popular in various industries and sectors as they provide greater convenience and cost-efficiency than physical documents. PDF is a widely used format for creating and sharing electronic documents, while HTML is commonly used in mobile environments as the foundation for creating web pages displayed on mobile devices, such as smartphones and tablets. HTML is becoming a more critical document format as mobile environments have been raised as the primary communication channel nowadays. However, HTML does not have the standard content integrity feature, and an electronic document based on HTML consists of a set of related files. Therefore, it has a vulnerability in terms of reliable electronic documents. We have proposed Document HTML, a single independent file with extended meta tags, to be a reliable electronic document and Chained Document, a single independent file with a blockchain network to secure content integrity and delivery assurance. In this paper, we improved the definition of Document HTML and researched certified electronic document intermediaries. Additionally, we designed and validated the electronic document distribution service using Enhanced Document HTML for real usability. Moreover, we conducted experimental verification using a tax notification electronic document, which has one of the top distribution volumes in Korea, to confirm how Document HTML provides a content integrity verification feature. Document HTML can be used in an enterprise that must send a reliable electronic document to a customer with an electronic document delivery service provider.

1. Introduction

A document is essential to communicate between an enterprise and a customer. It contains sensitive, personalized information that is exchanged in various formats and channels. In the digital era, electronic documents have replaced traditional physical documents. Electronic documents mimic the layout and formatting of traditional physical documents and are easily readable by humans. PDF (Portable Document Format) is the de-facto standard format for electronic documents as it can be viewed or printed like a physical document and has the ability to maintain content integrity through the use of digital signatures. However, the user experience on mobile devices can be uncomfortable due to their smaller screen size compared to paper. HTML-based electronic documents are more suitable for mobile environments, but they do not ensure content integrity like PDF-based electronic documents do. Despite HTML being the main format for mobile environments, content integrity is still at risk. HTML does not have resources embedded and relies on external resources to display content. Additionally, a standard specification is necessary to ensure content integrity. The proposed Document HTML includes digital signatures to create a reliable electronic document [1]. Chained Document is an extension of Document HTML using blockchain technology to ensure content integrity and delivery assurance [2]. However, these approaches may have weaknesses in using meta tag declarations in terms of usability and compatibility because the meta tag is not readable since it is based on the comment tag. In addition, there is a vulnerability when loading external resources through embedded CSS files. Additionally, these did not consider how reliable content is distributed in real-world scenarios by third-party electronic document delivery service providers. Therefore, in this paper, we needed to improve the design of Document HTML by introducing a new style of meta tag and conformance. Moreover, we investigated how Extended Document HTML works with certified document delivery services in the real world. We selected an actual electronic document and performed experimental verification of the Extended Document HTML. As a result, this paper proposes Extended Document HTML and reliable delivery services that can be used in the real world. We conducted related research for the electronic document area and the digital signature technology in Section 2 and improved the Document HTML and investigated third-party electronic document delivery service providers in Section 3. We designed an electronic document distribution system with Document HTML and certified document delivery services in Section 4 and experimental verification in Section 5. We discussed the limitations of this research in Section 6 and finally concluded the value of this research in Section 7.

3. Enhanced Document HTML

3.1. Improvement of Document HTML

The previously proposed Document HTML and Chained Document have vulnerabilities in terms of usability and content integrity, but we have improved the design of Document HTML. First, Document HTML and Chained Document contain the extended tag, which has limited usability of the digital signature as an HTML comment tag specification is used for the extended tags, as shown in Figure 4. Therefore, the extended meta tag is difficult to use as a browser does not render an HTML comment tag. The Enhanced Document HTML meta tags are defined by an HTML <meta> tag, as shown in Figure 5, so it has better usability in a web browser as it is an extended keyword definition in a <meta> tag. The targeted content to be digitally signed is located before and after the Enhanced Document HTML meta tag as meta tags are located in <head> tags, as shown in Figure 6. The <ds-range> meta tag has the byte position value for the target area, as shown in Table 3. It has four subsequent hexadecimal expressions, which are the start offset and end offset of the before-signature area and the start offset and end offset of the after-signature area.
Figure 5. Enhanced Document HTML meta tags.
Figure 6. Enhanced Document HTML structure with the digital signature.
Table 3. <ds-range> meta tag in the Document HTML.
Second, an Enhanced Document HTML type declaration is made by the standard DOCTYPE, contrary to the previous research that proposed a Document HTML type declaration using an HTML comment tag. It provides more usability and makes it easier for web browsers to understand the document type. Third, the @import function is not allowed in CSS. The @import function can link to an external CSS, and it is against the definition that all resources for Enhanced Document HTML must be embedded. The content of Enhanced Document HTML can be displayed differently if an external CSS is revised in an unauthorized manner. Therefore, it is not allowed.
Last, Enhanced Document HTML does not allow the use of script. The previously proposed Document HTML only does not allow the use of asynchronous data loading to avoid dynamic content loading. However, the non-asynchronous script function can also load the data from an external source, and it causes content integrity vulnerability. Therefore, it is not allowed.

3.2. Definition of Extended Document HTML

Extended Document HTML is the proposed HTML specification to provide more robust content integrity to act as a trusted electronic document. Extended Document HTML proposed a restricted HTML specification to solve the weakness of an electronic document based on HTML, which could not provide reliability through the following conformance based on improvement:
(a)
Extended Document HTML must have a DOCTYPE declaration, as shown in Figure 7;
Figure 7. DOCTYPE declaration of Document HTML.
(b)
Extended Document HTML uses UTF-8 encoding;
(c)
All resources must be embedded, and external resources are not allowed. The data URL Scheme in RFC 2397 is used to convert resources to internal resources, as shown in Figure 8;
Figure 8. Data URL Scheme.
(d)
The @import function is not allowed in CSS. The @import function can link to an external CSS, and it causes vulnerability in terms of content integrity. Therefore, it is not allowed;
(e)
Action script, such as JavaScript, is not allowed. The script function can load the data from an external source, and it causes vulnerability in terms of content integrity. Therefore, it is not allowed;
(f)
A multimedia tag, such as <audio> or <video>, is not allowed. A multimedia tag is not essential to present content in a document. These tags are not essential in terms of a document perspective, and they could cause a file size problem. Therefore, they are not allowed;
(g)
The <iframe> tag is not allowed. An <iframe> tag links to content from an external location, and the content is not part of the document. As such, it causes vulnerability in terms of content integrity. Therefore, it is not allowed;
(h)
An external resources container, such as <object>, <embed>, or <param>, is not allowed. These tags allow links to non-HTML objects. These tags contain device or OS-dependent values, making it difficult to embed them. Therefore, they are not allowed;
(i)
An Extended Document HTML meta tag must be included to have a content integrity verification feature, as shown in Figure 5. The <ds-range> tag indicates the byte area in the Document HTML, which is needed to have content integrity. The <ds-digest> tag has the message digest value using a hash function for the area. The <ds-signed-digest> tag has the signed message digest value from the <ds-digest> value using a PKI certificate. The <ds-cert> tag has the public key and the certificate information to verify the <ds-signed-digest> value.
An Extended Document HTML specification makes an electronic document based on HTML a single independent document with a content integrity verification feature. It provides content integrity like a PDF does as it uses a PKI certificate. Extended Document HTML can have a responsive content presentation by inheritance from HTML and CSS technology and document authenticity using a PKI certificate. It means Extended Document HTML has an advantage over electronic documents based on both HTML and PDF. In addition, it is a suitable file format for the long-term archive as Extended Document HTML is an independent document format with embedded resources. However, there is no protocol or system to generate Extended Document HTML and verify Extended Document HTML in a legacy system. Therefore, an Extended Document HTML system could be required for generating and verifying Extended Document HTML, which integrates with a legacy system to deliver a reliable electronic document to a customer.

4. Design of Electronic Document Distribution Service Based on Enhanced Document HTML

4.1. Certified Electronic Document Intermediary

There are two major processes for distributing an electronic document in an enterprise. First, the electronic document content needs to be generated, called document generation. Second, the electronic document content needs to be delivered to a customer from an enterprise, called document delivery. Document delivery needs to be delivered via various channels, such as email, and a document delivery service provider does this. A document delivery service provider has to provide a secure platform to protect sensitive personal information. Therefore, a government or a central consortium organization manages the qualification of being a document delivery service provider [11]. The law is the “Framework act on electronic documents and transactions”, and there is a regulation for “Certified Electronic Document Intermediary” in the framework. There are fifteen certified electronic document intermediaries under the Korea Internet & Security Agency as of May 2022 [11]. These intermediaries provide delivery of digital content with user authentication, and they maintain the digital integrity metadata of the content. However, there is a vulnerability in content integrity as these services are unable to verify linked resources of the digital content based on HTML.

4.2. Electronic Document Distribution with Document HTML

Document HTML is a single electronic document format containing the metadata to verify content integrity. Therefore, it can secure content integrity with the document delivery service provider, the certified electronic document intermediary, as shown in Figure 9. The Electronic Document Creator requests to create Document HTML after creating an electronic document. Then, the electronic document is converted to Document HTML, which has a content integrity verification feature. The Electronic Document Creator stores Document HTML, and the link for the Document HTML is sent to the certified electronic document intermediary. The customer opens the Document HTML after the user authentication by the certified electronic document intermediary, and the customer can see the content with content integrity verification features.
Figure 9. The E-Document Delivery with Document HTML system.
The integrated process with the certified electronic document intermediary and the Document HTML system provides the integrity advantage for the delivery and content perspective. A customer can secure the electronic document delivery verification by the certified electronic document intermediary and content integrity verification by the Document HTML system. Therefore, this integrated process can be used for the enterprise or government that must legally send the electronic document. It can be an alternative digital communication instead of registered postal mail.

5. Experimental Verification

5.1. HTML Electronic Document for Experiment

In our previous research, we used a dummy sample statement but, for this study, we have chosen the electronic tax notification document from the Korea National Tax Service. This electronic document is one of the highest volume documents sent to a citizen from the Korea Tax Agency and is delivered through a certified electronic document intermediary. Therefore, we have verified how HTML documents provide robust and reliable content integrity in real-world scenarios. The tax notification document can be opened after user authentication, and the certified electronic document intermediary redirects the URL link to the Korea National Tax Service website. Each URL link after user authentication is the personalized link that contains the user authentication metadata and is hard to predict to the URL for security. However, the tax notification document is based on HTML, linked to external resources, as shown in Table 4, and has a content integrity vulnerability. The tax notification document is opened in a web browser with linked external resources, as shown in Figure 10. There is no way to verify that the external resources are original.
Table 4. Resources list for the personal tax notification document in Korea Tax Service.
Figure 10. The Tax Notification Electronic Document.
The HTML tags in the tax notification document are shown in Table 5. All the tags are standard tags related to layout, such as the <p> tag, and there is no JavaScript action and <iframe> tag, which Document HTML does not allow the use of for content integrity purposes.
Table 5. HTML Tags in the Tax Notification Document.

5.2. Generation of Document HTML

We generated the Document HTML using the sample tax notification document, and the generation result is shown in Table 6. Document HTML is generated well, as no items are being violated. The eleven external resources are converted into internal resources, and the tax notification document is generated into a single Document HTML. The file size has been increased from 1,833,435 bytes to 2,462,018 bytes because of the Document HTML metadata and BASE64 encoding of the internal resources. However, the file size is almost the same as the tax notification document with a PNG image file format for personal archiving purposes.
Table 6. The Document HTML Generation Result.
The Document HTML has the same content as the original tax notification electronic document, and all resources are internal, as shown in Figure 11. Moreover, the Document HTML meta tags are inserted to verify the content integrity, as shown in Figure 12. A customer, who doubts the document’s originality, verifies the content integrity in the verification menu from the Document HTML system using these Document HTML metadata.
Figure 11. The Tax Notification Electronic Document based on HTML Document.
Figure 12. HTML sources of the Tax Notification Electronic Document based on Document HTML.

5.3. Verification of Document HTML

We verified the Tax Notification Electronic Document, which was generated in the previous section and verified the harmed Tax Notification Electronic Documents to compare the verification result. We received the verification result via the verification web menu, as shown in Figure 13. Figure 13a shows that the Tax Notification Electronic Document is valid and keeps content integrity using the digital certificate issued by Let’s Encrypt. Figure 13b shows the verification failures due to the content being altered after generating the document in Document HTML format. Additionally, Figure 13c shows the verification failures due to the digital signature being altered.
Figure 13. Verification example of the Tax Notification Electronic Document. (a) Verification result of valid document; (b) Verification result of invalid document; (c) Verification result of invalid digital signature.
Thus, the Tax Notification Electronic Document based on Document HTML can be verified when a document receiver has to confirm the integrity of the document as Document HTML is a single HTML file with digitally signed extended meta tags.

6. Discussion and Limitations

Enhanced Document HTML is a type of digitally signed version of HTML. As there is no way to verify the content integrity of external resources in HTML, we proposed to embed all related resources internally and set conformance to remove the vulnerability content integrity perspective. In addition, Enhanced Document HTML can have responsive web content presentation by inheritance from HTML and CSS technology. It means Enhanced Document HTML has a flexible content layout for various devices, including mobile devices, and document authenticity. Enhanced Document HTML needs to work with third-party solutions as a service-oriented service because it helps to generate a reliable electronic document [12], as shown below in Figure 14. Most enterprise solutions are unified via the data integration layer, and each solution can generate a reliable electronic document with Enhanced Document HTML via the present integration layer [13].
Figure 14. Data and present integration layer: basic architecture.
However, Enhanced Document HTML has strong definitions to provide a reliable electronic document. Hence, it has a limitation in providing an abundant interactive user experience. For example, it is not allowed to conduct dynamic content loading from a server in Enhanced Document HTML. Therefore, Enhanced Document HTML is helpful in a particular domain that needs to send electronic documents containing static content with content integrity.

7. Conclusions

All enterprises and governments provide customer service. An electronic document is an essential communication tool for delivering content to a customer instead of physical documents in the digital era. Communication based on physical channels is well-established, and regulation has been developed to maintain content and distribution integrity. PDF is the de facto standard electronic document format for replacing a physical document, and its specification has been improved to provide content integrity. However, electronic documents based on HTML, the prevalent language in web and mobile environments, do not have standard content integrity verification features, so there is a vulnerability. We enhanced Document HTML to create a reliable single electronic document with a content integrity verification feature, and it has better usability in web browsers. We researched the generation and distribution of electronic document generation with a certified electronic document intermediary, and we designed an electronic document distribution service using both a certified electronic document intermediary and Document HTML for a real-world scenario. Additionally, we researched the tax notification document from the Korean National Tax Service and conducted experimental verification using the tax notification document. We confirmed electronic documents based on Document HTML are usable with third-party electronic document delivery service providers and provided a content integrity verification feature so that a customer can be sure an electronic document based on Document HTML has content integrity. We expect Document HTML to be used by enterprises and governments to deliver a reliable electronic document with a legal right to avoid legal disputes. In future research, we will continue to design service-oriented architecture to be one of the solutions in enterprise systems to provide reliable electronic documents.

Author Contributions

Writing—original draft, H.-C.H.; Writing—review & editing, W.-J.K.; Supervision, W.-J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Seoul National University of Science and Technology grant number 2020-0643.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hwang, H.C.; Kim, W.J. Design and Implementation of Document-HTML System for an Authorized Electronic Document Communication. J. Adv. Eng. Technol. 2021, 14, 61–73. [Google Scholar]
  2. Hwang, H.C.; Kim, W.J. Design of Chained Document HTML Generation Technique Based on Blockchain for Trusted Document Communication. Electronics 2022, 11, 1006. [Google Scholar] [CrossRef]
  3. Warnock, J.E.; Geschke, C. Founding and Growing Adobe Systems, Inc. IEEE Ann. Hist. Comput. 2019, 41, 24–34. [Google Scholar] [CrossRef]
  4. PDF Association. PDF Specification Index. 2022. Available online: https://bit.ly/3bpyZV1 (accessed on 25 May 2022).
  5. HTML5. 2022. Available online: https://html.spec.whatwg.org/ (accessed on 25 May 2022).
  6. Lee, B.-K. HTML specification and semantics analysis of Korean news sites. J. Digit. Contents Soc. 2017, 18, 949–959. [Google Scholar]
  7. Kaczmarczyk, A.; Zabierowski, W. The Comparison of Native and Hybrid Mobile Applications for Android System. In Proceedings of the 2021 28th International Conference on Mixed Design of Integrated Circuits and System, IEEE, Lodz, Poland, 24–26 June 2021; pp. 290–293. [Google Scholar]
  8. Long, S. A Comparative Analysis of the Application of Hashing Encryption Algorithms for MD5, SHA-1, and SHA-512. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2019; Volume 1314, p. 012210. [Google Scholar]
  9. Jun-Ho, S.; Sung-Su, K.; Seog, J.M. Diffie-Hellman Based Asymmetric Key Exchange Method Using Collision of Exponential Subgroups. Korea Information Processing Society. Softw. Data Eng. 2020, 9, 39–44. [Google Scholar]
  10. Dasso, A.; Funes, A.; Riesco, D.; Montejano, G. Computing Power, Key Length and Cryptanalysis. An Unending Battle? arXiv 2020, arXiv:2011.00985. [Google Scholar]
  11. E-Document Integration Support Center. Certified Electronic Document Intermediary. Available online: https://bit.ly/3y1shw8 (accessed on 25 May 2022).
  12. Górski, T. UML Profile for Messaging Patterns in Service-Oriented Architecture, Microservices, and Internet of Things. Appl. Sci. 2022, 12, 12790. [Google Scholar] [CrossRef]
  13. Petrasch, R.J.; Petrasch, R.R. Data Integration and Interoperability: Towards a Model-Driven and Pattern-Oriented Approach. Modelling 2022, 3, 105–126. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.