Standardization Procedure for Data Exchange

: Common speciﬁcation of data promotes data exchange among many and unspeciﬁed individuals and organizations. However, standardization itself tends to discourage innovation that can create new uses of data. To overcome this dilemma of innovation and standardization, this paper analyzes and proposes hypotheses regarding the process through which the World Wide Web Consortium (W3C) has realized innovations such as web applications by updating the standard. I hypothesize the following changes in standardization process management at the W3C as key factors supporting innovation through standardization among stakeholders with conﬂicting interests: (1) deﬁning the scope of the speciﬁcations to be developed according to functions instead of technical structures; (2) design of a development management policy based on feedback from implementations, referred to as an “implementation-oriented policy”; (3) inclusion of diversiﬁed stakeholders in open standardization processes that facilitate consensus formation and the di ﬀ usion of developed standards; and (4) adopting a royalty-free to encourage third-party developers to implement proposed speciﬁcations and advance update of proposals. This single case analysis leads to the development and di ﬀ usion of common technological data speciﬁcations, which are the driving factors for innovation utilizing big data generated by exchanging data of various origins.


Introduction
The development of conditions for data-driven management is being advanced due to global digitalization, including the use of artificial intelligence (AI). Legal system design, technical specifications, and the architecture are required to facilitate data exchange between various organizations.
Technology standards are required to analyze data generated by various entities in an integrated manner or to generate learning data for AI because systems do not work well without interoperability of data.
The data linkage layer is a set of reference architectures adopted by various countries in common. Standard specifications for achieving mutual availability are one of the most important elements in the data linkage layer.
While interoperability is essential for generating large amounts of learning data, it is challenging for a specific specification to be widely accepted and to serve as an effective standard. Most data are generated in compliance with various formats according to diverse contexts. Nevertheless, the standard development of data specifications is an essential element for providing an environment for AI utilization.
This paper proposes hypotheses of promotional factors for data exchange among diversified organizations and individuals through a qualitative analysis of an emerging use case of standardization among fiercely competitive players with different contexts at the World Wide Web Consortium (W3C).
A particular focus was placed on standards that realized web applications. HTML5, the newest version of W3C's standard, realizes "web applications," allowing applications such as spreadsheets to run on servers instead of on client hardware regardless of the type of browser used by the client (Figure 1). The architectural transition was led by Google. Google was established as a company to provide search engine technology. Then they came to find advertising business as a powerful revenue stream. Google launched Google Ads in 2000, and then 96% of Google's revenues was from advertising in 2010. Google had addressed to diversify revenue with web applications such as Gmail, Google Maps and so on. Even though the advertising business has been Google's largest source of revenue until 2020, percentage of it had declined to 83.9%, and Google cloud segment including G Suite come to generate 5.8% of Google's revenue in 2019. HTML5 is a specification required to diffuse web applications and make it be one of important revenue sources for Google.
HTML5 works only when all components operate together according to the standards. Architectural transformation with standardization can be realized only with a consensus among all of the players in the Web domain.
Google's G Suite is a productivity tools including spreadsheet, word processor and presentation, and compete with Microsoft Office. Microsoft office is developed as native applications which run on operating system installed on local devices. Functions that Google intended to provide with web applications had already been provided by Microsoft as native applications. Introducing HTML5 benefits Google, and damages Microsoft.
Apple also provide productivity tools as Numbers, Pages, and Keynote. However, they are distributed for free. Apple welcomes the diffusion of web applications that run on browsers of MacOS and iOS and has contributed towards development and standardization activities ( Figure 2). The concept of web applications is based on an architecture wherein web browsers work as a runtime environment. Microsoft and Apple have been leading browser vendors, and the functions The architectural transition was led by Google. Google was established as a company to provide search engine technology. Then they came to find advertising business as a powerful revenue stream. Google launched Google Ads in 2000, and then 96% of Google's revenues was from advertising in 2010. Google had addressed to diversify revenue with web applications such as Gmail, Google Maps and so on. Even though the advertising business has been Google's largest source of revenue until 2020, percentage of it had declined to 83.9%, and Google cloud segment including G Suite come to generate 5.8% of Google's revenue in 2019. HTML5 is a specification required to diffuse web applications and make it be one of important revenue sources for Google.
HTML5 works only when all components operate together according to the standards. Architectural transformation with standardization can be realized only with a consensus among all of the players in the Web domain.
Google's G Suite is a productivity tools including spreadsheet, word processor and presentation, and compete with Microsoft Office. Microsoft office is developed as native applications which run on operating system installed on local devices. Functions that Google intended to provide with web applications had already been provided by Microsoft as native applications. Introducing HTML5 benefits Google, and damages Microsoft.
Apple also provide productivity tools as Numbers, Pages, and Keynote. However, they are distributed for free. Apple welcomes the diffusion of web applications that run on browsers of MacOS and iOS and has contributed towards development and standardization activities ( Figure 2). Information 2020, 11, x FOR PEER REVIEW 2 of 10 A particular focus was placed on standards that realized web applications. HTML5, the newest version of W3C's standard, realizes "web applications," allowing applications such as spreadsheets to run on servers instead of on client hardware regardless of the type of browser used by the client (Figure 1). The architectural transition was led by Google. Google was established as a company to provide search engine technology. Then they came to find advertising business as a powerful revenue stream. Google launched Google Ads in 2000, and then 96% of Google's revenues was from advertising in 2010. Google had addressed to diversify revenue with web applications such as Gmail, Google Maps and so on. Even though the advertising business has been Google's largest source of revenue until 2020, percentage of it had declined to 83.9%, and Google cloud segment including G Suite come to generate 5.8% of Google's revenue in 2019. HTML5 is a specification required to diffuse web applications and make it be one of important revenue sources for Google.
HTML5 works only when all components operate together according to the standards. Architectural transformation with standardization can be realized only with a consensus among all of the players in the Web domain.
Google's G Suite is a productivity tools including spreadsheet, word processor and presentation, and compete with Microsoft Office. Microsoft office is developed as native applications which run on operating system installed on local devices. Functions that Google intended to provide with web applications had already been provided by Microsoft as native applications. Introducing HTML5 benefits Google, and damages Microsoft.
Apple also provide productivity tools as Numbers, Pages, and Keynote. However, they are distributed for free. Apple welcomes the diffusion of web applications that run on browsers of MacOS and iOS and has contributed towards development and standardization activities ( Figure 2).  The concept of web applications is based on an architecture wherein web browsers work as a runtime environment. Microsoft and Apple have been leading browser vendors, and the functions for web browsers to provide are different among each company's strategy. There had been conflict regarding the role of web browsers, and the differences in terms of the ideal role of browsers have made it difficult to update the HTML standard. Microsoft finally changed their attitude to HTML5 and has launched a business with their own web applications named Office online. This change is not a spontaneous move by Microsoft, but a passive response.
To promote data sharing and trading among various and diversified organizations and individuals, data must be generated according to common specifications including ontology, vocabulary, syntax and so on to utilize and analyze with AI interestedly. Each organization has their own vocabulary and data model, and it is difficult to standardize such specifications. It is difficult to motivate stakeholders to change data specifications for emergent and less understood applications such as AI.
HTML is a standard that has transformed the Web and has allowed it to be used for completely new purposes. It was difficult to reach consensus for the direction of HTML update among stakeholders with conflicting interests. Standards for data transaction have similar features in HTML5 because learning data for AI is a new application, and the requirement for specification is different from that in existing applications.

Role of Standards
There are various categorizations of standards, and each has different features. Researchers classify standards into quality/safety standards and interoperability/compatible standards [1][2][3]. This paper focuses on compatibility standards. Compatibility standards are highly network external. Therefore, it is difficult to compete with technologically differentiated specifications based on de facto standards because the direct network effect causes the lock-in effect [4,5], and switching costs prevent users and complementary goods suppliers from adopting more effective or sophisticated specifications [6]. Excess inertia is being locked into nonoptimal technology, such as the QWERTY keyboard [7]. Therefore, the first-mover strategy [8] is effective for compatible standard setting. Proposers of standards for open systems tend to adopt a strategy of "priming" future expectations [9,10]. Compatibility is realized only with agreement among stakeholders, including on standards set by market competition. Therefore, standardization must involve a competition-cooperation interplay during multi-firm technology coordination [11].

Competition and Cooperation for Standard Setting
Not all but some standards play a role in platforms [12]. HTML, which used to simply be a mark-up language for stable documents, has transformed into a runtime environment for web applications. HTML5 can be regarded as a platform because it provides functions commonly required for the operation of various web applications and realizes coordination among these functions [13,14]. A platform is a component commonly utilized by multiple complements [14,15].
Standard setting is a collaborative activity among diversified stakeholders. Most standardization processes are not conducted only through the official channels of standard-setting organizations. Both engineers who belonged to member companies of the W3C and independents have discussed the development and improvement of specifications. The development of HTML5 is a typical case of distributed collaboration [16].
Unlike ordinary joint ventures, stakeholders of standardization often have conflicting interests. The standardization of HTML5 also serves as a case of co-opetition [17,18] among Google, Apple, and Microsoft. Google's success in the standardization has involved the consensus formation among the members of the W3C in securing widespread support from outside the W3C. This transition of standards was realized by building consensus and cooperative development by stakeholders with conflicting interests. Platform development and standard setting for platform functions are typical multi-sided market [19][20][21]. Most internet-and web-related compatibility standards are developed through coordination among Web browser vendors, content developers for the internet/web and end users.

Dilemma between Interoperability and Innovation
The generation of big data by accumulating diverse data is expected to lead to the emergence of innovation. Since the original meaning of innovation is the new combination [22], it can be expected that the combination of data resources that have never been combined can promote innovation.
On the other hand, interoperability or convertibility is necessary to generate machine readable data resources for AI analysis. Common technological specifications, in other words, standard plays important role to promote data exchange for emerging applications.
However, standards have a feature to prevent innovation [23] because it works by reducing the variety of goods [24]. Moreover, excess inertia makes standardized specifications to be locked in to a once-spread ones [25]. Interoperability encourages data integration among diversified sources. At the same time standard prevent data owners from changing their original or industry-specific specifications. Standards make existing businesses designed according to industry-specific rules more efficient. However, they also interfere with the creation of innovation through new cross-industry data transactions.

Materials and Methods
A case study approach [26] is used in this research because there are few cases of standardization involving diversified participants. Since the phenomenon of interests is emerging and under-theorized, the inductive case study approach is suitable for our research [26,27]. This case study focuses on the continuous development of the standardization process management policy at the W3C. I conducted a comparative analysis of the development process for HTML5 and its earlier versions: HTML 3.2, HTML4, and document object model (DOM) Levels 1 and 2. DOM Levels 1 and 2 are earlier specifications of HTML5's application programming interface. This inductive hypothesis-building study attempts to develop generalizable conclusions from a rare event.
I conducted fieldwork at the W3C office in Japan as an intern from April 2010 to March 2013 and analyzed the flow of the standardization process as defined by the mailing list archives of the working groups, the meeting minutes, technical documents and the public relations materials. This analysis involved a study of internal documents and emails from the archives issued since the standard-setting organization was established. I also conducted interviews with individuals from the W3C staff and member organizations as well as with developers outside of W3C member organizations.

Case Study
HTML5, as a runtime environment for applications, plays the role of a platform. Therefore, research design must be based on the premise of the multi-sided market [19,20], which consists of not only standard setters, but also engineers as technology users outside of W3C and end-users. The architecture of web applications consists of nested technological layers, and the upper layers rely on the lower layers. This type of architecture is referred to as a layered modular architecture [28]. In some cases, a tying strategy to develop and complement the adjacent layer [29,30] is adopted. Tying helps to develop a competitive advantage and discourage competitors from entering the market [29,31] by monopolizing certain layers [32]. The leading players in HTML5 development are organizations conducting business in adjacent layers, such as operating systems (Microsoft and Apple) and web applications (Google). It is rational for these players to develop and promote proprietary technology for Web applications based on a tying strategy.

Failure (Fragmentation of Standards) in the Early Stage: HTML 3.2/HTML 4.0
There was no formal procedure for the standardization process at the W3C, although the W3C was established in 1994. The standardization activity for the substitute HTML 3.0 was launched by volunteer engineers who had contributed to the HTML working group (WG) of the Internet Engineering Task Force (IETF.) These engineers were invited to become members of the HTML Editorial Review Board (HTML-ERB), but they were not elected through a formal procedure because there was no official document governing the organization and the standardization processes at that time.
The W3C began to develop a policy for standardization process management in June 1996. To support that policy, the Process Editorial Review Board (Process-ERB) was established and began to develop the "Process Document," which governs the standardization process and organizational operation. HTML 3.2, the first standard developed at the W3C, consisted of specifications negotiated by browser vendors in closed discussions among a small number of representatives from the member organizations. The standard did not embrace the functions of emerging technologies. HTML 4.0 was developed in an HTML WG that was reorganized as the HTML-ERB working group. There were three steps for the specification documents: the working draft (WD), the proposed recommendation (PR), and the recommendation (REC: the W3C's standard). The WD could be advanced to the PR only after the specification was stabilized. The PR could be advanced to the REC with votes from all the member organizations.
Although HTML 3.2 and HTML 4.0 were developed under different standardizing process rules, neither version prevented Microsoft or Netscape from implementing proprietary extensions in their own products, and thus, HTML remained fragmented. This fragmentation was also not resolved after a minor version upgrade from 4.0 to 4.1.

The Process Management Policies Clarified: DOM Levels 1/2
The procedures for DOM development were different from those of earlier HTML versions. Specification drafts were made available to the public. Discussion was conducted with three mailing lists, which included documents for (1) the WG chair, draft editors and representatives of the member organizations; (2) anyone from a member organization; and (3) the public. Feedback received in response to the inquiries sent to a public mailing list was adopted within the specifications. HTML 3.2 and 4 did not succeed in realizing convergence because the HTML WG tried to develop the standard using reverse engineering on two proprietary extensions of the existing standard. To avoid the same failure, the DOM WG defined the scope of the specifications to be developed by functions instead of by technical structures. The WG set three levels of DOM specifications. Level 0 was defined based on functions that Microsoft and Netscape had already implemented in their browsers. Level 1 was defined based on functions that they planned to develop and implement in the next versions of their browsers (Internet Explorer 4/Navigator 4). Level 2 was defined as a specification with advanced functions from Level 1. DOM was the first standard that was developed with no specification already implemented.
Level 1 was based on a proposal from Microsoft. The first draft specified that data be manipulated only with JScript and VBscript, which are Microsoft's proprietary programming languages. The specification was changed to be compatible to JavaScript, implemented by Netscape, after moderation from the editor at Sun Microsystems. However, Netscape still did not eagerly contribute to the standardization process. The conflict between Microsoft and Netscape was suddenly brought to an end when AOL acquired Netscape and made the source code of Navigator open source. DOM level 1 was standardized after approval based on the state of competition in the market.

Introducing an "Implementation-Oriented Policy"
The W3C added a candidate recommendation (CR) phase between the WD and the PR in 1999. A WG is required to demonstrate the feasibility of a specification during the CR phase. This demonstration is described in the process documentation, which is excerpted below: "Show that each feature of the technical report has been implemented. Preferably, the Working Group should be able to demonstrate two interoperable implementations of each feature [33]." This means that no proposed specification is ever certified as a standard without at least two implementation cases. Therefore, specifications can be improved based on feedback from the implementations by developers other than WG members.
The most specific feature of standardization at W3C is the policy that WGs must call for implementation before specifications are considered to be stable. Most standards are developed and fixed first, and then, compatible products or services are developed and supplied. In contrast, W3C encourages stakeholders to implement the specifications under discussion and pushes the standardization process forward with discussion featuring feedback from the implementation cases. The process management policy requiring that implementation cases be supplied before specifications become stable is referred to as an "implementation-oriented policy." The DOM Level 2 specification was split into modules including DOM Level 2 HTML on 27 September 2000. Netscape began to take the lead in the development process for DOM Level 2 HTML. Netscape installed one of its employees as the editor for working draft version 7 December 2001, and the later versions. Netscape also sent a testimonial regarding the specification when standardizing was completed. Moreover, Netscape implemented the standard in Netscape 6, released on November 2000, before Microsoft did so with Internet Explorer 6 in August 2001. In other words, Microsoft, which had a larger market share for its web browser, had to catch up with Netscape in the implementation of new features. The specification developed through the initiative of a disadvantaged browser vendor turned out to be an effective standard.

Establishment of a Process Management Policy through Competition between Proposals: HTML5/XHTML
HTML5 was developed through a conflict between two proposals. One was XHTML, which was developed with Extensible Markup Language (XML) technology and lacked compatibility with HTML4.01. The other was HTML5, which was designed to be compatible with the existing version of HTML and consisted of functions for web applications.
HTML5 was initially proposed by Mozilla, Opera Software and Apple. These companies proposed the specification to W3C on April 2004. However, W3C rejected this specification because they had already begun the standardization process of XHTML as the next version of HTML. Mozilla, Opera and Apple launched a specification development activity as a grassroots developers' community referred to as the Web Hypertext Application Technology Working Group (WHATWG) and continued to develop their specification separately. There came to be two standard candidates for the next version of HTML.
XHTML and its related specifications were developed through a closed process at W3C. In contrast, WHATWG had made their HTML5 development activity open to the public. Mozilla, Opera and Apple promptly implemented HTML5 in their browsers. Google, which is one of the largest web application service providers, hired engineers to work on the development of Mozilla Firefox. Google transferred these engineers to the position to develop their own browser, named Chrome. Four browser vendors worked together to develop and improve the HTML5 specification and cooperatively implement it in their own products.
The organizations supporting XHTML did not succeed in increasing their implementation cases [34,35]. However, the browser vendors supporting WHATWG had taken measures to encourage developers outside of their organizations to learn HTML5 and create implementation cases. These types of activities are referred to as developer relations activities. WHATWG honored excellent programmers as experts in HTML5 and encouraged grassroots developers to organize communities to increase implementation cases and to acquire proficiency in the specification. The voluntary activities of WHATWG were derived from the implementation-oriented policy, which encouraged proposers to increase implementation cases and the number of programmers using the proposed specifications. HTML5 was chosen as the next generation of HTML because it was more broadly implemented. The process at the W3C HTML WG was open to the public as were the development activities of WHATWG.

Patent Policy
A royalty-free policy is necessary for the implementation-oriented policy to work because a third party cannot implement proposed specifications without a risk of infringement of intellectual property rights.
The W3C has been managed according to a policy similar to open source because the Internet has been developed by engineers, especially at academic institutes, who tend to support freedom of software and believe that the Internet must be kept open.
Although the policy for intellectual property rights had not been clearly stated by the W3C initially, there was a shared recognition that the standardizing process was managed under a royalty-free policy implicitly. However, the W3C faced litigation over intellectual property rights such as SVG and RDF as the web became more diffused and the number of specifications developed increased.
The W3C established the Patent Policy Working Group to discuss intellectual property-related issues in September 1997. The policy proposal that allowed the application of RAND (reasonable and non-discriminatory) as well as royalty-free was initially presented. The proposal to admit RAND did not receive support, and the working group decided to continue to adopt only the royalty-free approach.

Discussion
The W3C incrementally developed its standardization process management policy (Figure 3). The changes in the revisions of the process document are summarized as follows (

Conclusions
Through the case analysis of the update of the HTML standard, I discuss the institutional design to develop and diffuse a common data specification that is necessary for data sharing and trading among variety of organizations and individuals. Big data for AI analysis can be generated through integration of diversified data sources. Common data models and technological standards helps to convert data resource generated with different context. Defining scope by function has an effect on processes, and all proposals to realize the same function are inevitably reviewed by a single WG; therefore, only one converged specification is certified as a standard. This mechanism works well for convergence.
No proposal can be certified as a standard without multiple interoperable implementation cases under a standardization process designed with an implementation-oriented policy. Therefore, proposers tend to open and promote their own implementation cases and to encourage others to use the proposed specification to increase the implementation cases.
For most existing standards, implementation cases are supplied after specifications have been fixed. The W3C content and services that are compatible with the developing standards are provided before the specifications become stable ( Figure 5). The WG can accumulate feedback based on implementation cases and can improve specifications to better meet the needs of programmers.

Conclusions
Through the case analysis of the update of the HTML standard, I discuss the institutional design to develop and diffuse a common data specification that is necessary for data sharing and trading among variety of organizations and individuals. Big data for AI analysis can be generated through integration of diversified data sources. Common data models and technological standards helps to convert data resource generated with different context. Not even a perfect specification can become a standard without adoption. Any standard must consist of widely accepted specifications. The implementation-oriented policy is a method of standardization process management that evaluates proposals based on a number of implementation cases and feedback from developers. This policy increases the effectiveness of standards by spreading information about the proposed specifications, increasing the number of implementation cases, and improving standards based on accumulated feedback. In other words, the implementation-oriented policy quickly establishes effective standards using network externalities.
Opening the standardization process encourages engineers outside of the W3C to create implementation cases to elicit feedback for improving specifications. This cycle enables the proposed specifications to be sophisticated and diffused.

Conclusions
Through the case analysis of the update of the HTML standard, I discuss the institutional design to develop and diffuse a common data specification that is necessary for data sharing and trading among variety of organizations and individuals. Big data for AI analysis can be generated through integration of diversified data sources. Common data models and technological standards helps to convert data resource generated with different context.
The case analysis of the HTML identifies and proposes as hypotheses four factors for the success of introducing common specifications: (1) defining the scope of the specifications to be developed according to functions instead of technical structures; (2) design of a development management policy based on feedback on implementations, referred to as an "implementation-oriented policy"; (3) inclusion of diverse stakeholders in an open standardization processes that facilitates consensus formation and the diffusion of developed standards; and (4) adoption of a loyalty-free policy that encourages cooperation among engineers from different organizations and enables the procurement of diverse sources of innovation.
Exchanges of data and the conversion of data models within each industry are prerequisite for publishing and sharing a wide variety of data. However, standards within industries can interfere with the development and migration for standards across the industries. Once the standard becomes widespread, excess inertia reinforces compatibility only within the industry or within a group of companies, and cross-availability is difficult to achieve. This is necessary for promoting data distribution on a larger scale and innovation using big data to break away from situations whereby each industry is limited to different specifications.
The four factors extracted from the HTML case are considered to contribute to the institutional design to overcome this issue. AI is a powerful tool for realizing innovation that utilizes big data, and data for machine learning needs to be machine-readable. Bigdata that consists of diversified resources without interoperability is worthless for AI. The hypothesis obtained from the case is effective not only for institutional design in standardization, but also for the promotion of data exchange and realization of innovation utilizing AI.