The Evolution of the Concept of Semantic Web in the Context of Wikipedia: An Exploratory Approach to Study the Collective Conceptualization in a Digital Collaborative Environment

: Wikipedia, as a “social machine”, is a privileged place to observe the collective construction of concepts without central control. Based on Dahlberg’s theory of concept, and anchored in the pragmatism of Hjørland—in which the concepts are socially negotiated meanings—the evolution of the concept of semantic web (SW) was analyzed in the English version of Wikipedia. An exploratory, descriptive, and qualitative study was designed and we identiﬁed 26 different deﬁnitions (between 12 July 2001 and 31 December 2017), of which eight are of particular relevance for their duration, with the latter being the two recorded at the end of the analyzed period. According to them, SW: “is an extension of the web” and “is a Web of Data”; the latter, used as a complementary deﬁnition, links to Berners-Lee’s publications. In Wikipedia, the evolution of the SW concept appears to be based on the search for the use of non-technical vocabulary and the control of authority carried out by the debate. As a space for collective bargaining of meanings, the Wikipedia study may bring relevant contributions to a community’s understanding of a particular concept and how it evolves over time.


Introduction
Wikipedia can be described as one of the "abstract social machines" advocated by Berners-Lee and Fischetti [1], in processes enabled by the World Wide Web (WWW) where people do the creative work and the machines does the administration.The concept includes the software and systems framework that supports it, as well as the rules, policies, and organizational structure governing the participation of the actors in the same "machine" [2].In the case of Wikipedia, the massive number of collaborators (more than 32 million registered users 1 ) contributes to the hypothesis that it is the most comprehensive project in the scope of Digital Humanities [3].Its dynamics make it used as a field for investigation of the interaction between humans and computational artefacts under several foci, such as sociological [4,5], informational [6,7], or educational [8,9].A systematization of the research areas of studies related to Wikipedia can be found in Tramullas' work [10].
Despite the sheer number of works on Wikipedia, it is possible to obtain a comprehensive view of their focus from extensive literature reviews [11][12][13], or from platforms like WikiLit or WikiPapers2 that collect these works.Among the issues debated around Wikipedia is its relationship with the academy.A relation in apparent change, from distrust and denial of its use, to an attitude of cautious acceptance or of its assumed use, namely as a pedagogical tool [14,15].For this change of attitude, we can find studies that point to the credibility of the information contained in Wikipedia [16,17], as well as the publication of experiments on its use in academia [18,19].
Considering that Wikipedia presents itself as a free encyclopedia where any Internet user can edit, it can be considered as a place where the collective construction of knowledge occurs.In this context, the present work intends to place the focus of the analysis on the evolution of a concept constructed in a collaborative way, represented in the respective entry in Wikipedia.Collective knowledge in this paper is understood in the sense given by Scardamalia and Bereiter, that is, the public knowledge available to be managed and used by others [20].In the same way that the collective construction of knowledge was limited to its observable exteriorization, the collective construction of a concept will be restricted here to its verbal definition represented in the form of written statements in a collaborative way.Although it can be understood as a reductionist view, we believe that these verbal externalisations are the ways in which a body of people can work through building a concept.We consider that this point of view fits in with Hjørland's view that "concepts are dynamically constructed and collectively negotiated meanings" [21].
The semantic web concept was chosen for analysis because we could observe an instability and lack of consensual definitions over time, even in the community directly related to its provenance, the Computer Science field.One previous piece of research, focused on the statements of the World Wide Web Consortium (W3C) and its director, Berners-Lee's works, showed that "the concept of Semantic Web is ambiguous and misinterpreting, given its biasing connection with the term 'semantics' and the association to other terms such as Web of Data, Linked Data or even Web of Linked Data" [22].The study describes the terminological and conceptual metamorphosis of the definition of semantic web, expressed in the documents analyzed.This condition of the semantic web concept confers on the collective construction space, materialized in the respective Wikipedia entry, a context conducive to the debate and negotiation of different personal perspectives.The existence of the previous study allows a comparison between the perspective described there and that of the editors of the Wikipedia article under analysis.We consider the approach of this study to be a relevant contribution both to the investigation of the relationship between Wikipedia and the academy, as well as to the field of Knowledge Organization, through empirical subsidies for the theoretical study of concept theories.
In this way, we intend to analyze the evolution of the semantic web concept in the English version of Wikipedia, treating this as a context of collective knowledge construction.For this purpose, the objective is to: (i) collect the different definitions presented in the "introductory section" of the Semantic Web article, from December 2001 (date of creation of the article) to December 2017; (ii) to analyze the definitions collected in relation to the concept in question; (iii) to diachronically compare the concepts among each other and between them and the analysis of the same concept based on the publications of Berners-Lee and W3C.
In the next section of this paper, we will present a brief theoretical framework for the current research.Section 3 presents the methodology used in the present study, followed by, in Sections 4 and 5 respectively, the presentation of the results and discussion, and in the sixth section, the major conclusions are summarized.

Background
Regardless of all the controversy surrounding Wikipedia, in particular as regards the quality of information [23,24], there are calls for the attention of the academic community in the sense of its importance, or even the inevitability of its use as a means of scientific dissemination aimed at a wider audience [13,19,25].As Nielsen points out: "Universities expect researchers to make their work more widely known, and extending Wikipedia is one way to spread both researchers' work as well as ordinary information seekers" [13].
The use of Wikipedia as a platform for scientific publication is restricted by its rule of non-admission of original research (NOR) 3 , based on the premise of the need for published sources that attest to the reliability of the information introduced.However, Wikipedia's dynamics are pointed out as "the potential model for more rapid and reliable dissemination of scholarly knowledge" [26], as exemplified by wiki-based scholarly publishing Species-ID 4 .In addition to the NOR rule, another issue that may negatively influence scientific writing on Wikipedia, by experts, is the absence of "academic reward" [13].One way to deal with these two issues can be found in the RNA Biology journal approach that requires authors of articles on new RNA families to submit them accompanied by the draft of a corresponding entry on Wikipedia, which then cites the original article [27].
Although there are some contact points between the academic environment and Wikipedia, its dynamics of production is quite different.Although several comparative studies involving Wikipedia have focused on the quality of their information content (cf.Table 1), few have addressed the evolution of the concepts presented.The focus on concepts arises in a different context, in studies that use Wikipedia as a source of textual data, with the purpose of extracting and using conceptual relations for processes of facilitation of information retrieval, natural language processing, and ontology construction [28].The authors "Our study suggests that Wikipedia is an accurate and comprehensive source of drug-related information for undergraduate medical education."[44] 2014 Biographies of scientists (400 entries) The authors "We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields . . . .In each of the examined fields, Wikipedia failed in covering notable scholars properly." Regarding the study of concepts, there is no consensus for what their nature is (mental representations or abstract entities?), or their constitution (bundles of features or they embody mental theories?).Different approaches, derived from Philosophy, Cognitive Science or Linguistics, have resulted in distinct theories of which stand out: the Classical Theory, the Prototype Theory, the Neoclassical Theory, the Theory-Theory, and Conceptual Atomism.All, according to Margolis and Laurence, present difficulties in explaining certain aspects involving concepts, among which, issues related to analyticity, compositionality, or ignorance and error [45].For these authors, concepts are mental representations and a theory with the necessary explanatory potential is only possible if one "admits different types of conceptual structure while tying them together by maintaining that concepts have atomic cores" [46].
From a perspective of Organization of Knowledge, we start from the pluralist epistemological position presented for the association of two distinct perspectives of two prominent authors of the area.The pragmatic positioning of Hjørland, based on the Theory of Theory, with Dahlberg's "theory of analytical concept of reference" within a neoclassical epistemic position.Dahlberg does not consider the influence of the social context in the formation of concepts, like Hjørland does, but takes it into account when it comes to their organization and representation [47].In this perspective, Dahlberg's theory of concept approaches the position of Hjørland with respect to the representation of concepts, so that the theory provides a reference for the characterization, categorization and decomposition of concepts [48].

Materials and Methods
In order to fulfil the defined objectives, an exploratory/descriptive qualitative study was designed, following an observational/comparative methodology [49].For the operationalization of the empirical component of the study we chose the English version of Wikipedia, since this is the language used in the W3C and Berners-Lee reference documents about the semantic web.Thus, the "history" 5 of the Semantic Web entry was mapped to identify the semantic changes made to the support statement of the respective definition, presented in the "introduction" of the different versions of this Wikipedia article.During the analysis, it was used whenever deemed necessary the "discussion" page 6 in order to obtain contextual information to help clarify the definitions presented.
As an analytical technique, the categorization was applied "by collection", that is, the categories resulted from the analogous and progressive classificatory process performed [50].Subsequently, a procedure based on "time-series analysis" [51] was used for the content units considered in each category, for diachronic comparison.The conceptual analysis focused on the identification of generic terms and their specifying characteristics [48], in order to compare the definitions collected.In the determination of generic terms, we sought for the non-use of compound terms, for the sake of simplicity.
In situations where the definitions use evaluative terms or contextual interpretation (on the "discussion" page and descriptions appended to the respective changes), we used the contributions provided by the analysis of the discursive strategies, in particular the predicative, of intensification and of attenuation, as long as they provide indicators on the valuing of characteristics and the attitudes and positions of stakeholders [52].

Results
There were 129 changes in the introductory part of Wikipedia´s entry titled Semantic Web, in which 26 definitions with some degree of semantic difference were identified (the corresponding statements are found in Table A1 Appendix A).In Table 2 we present the definitions grouped within each category, according to the respective generic term.
In a generic terms list there is an exception for the use of the compound term "web of data", which was considered necessary because of the syncronogenematic nature of the element "of data" [53] and its necessity for the meaning intended with the term in question.
The option for two categories, "main definition" and "complementary definition", was necessary since in some of the versions of the Semantic Web entry two or three definitions coexisted.In these cases, the analysis of their statements revealed two patterns: in one, an assignment of the definition to Berners-Lee (subcategory 2.1), and, on the other, a relation to the common usage of the term (subcategory 2.2).Units #01 and #02 (rf.GT (a)) were considered within the category 1 as the main definition, despite their close relationship with Berners-Lee, given that in these initial versions of the article they are the only definitions.The temporal distribution of the groupings, by generic term (see Table 2), is presented in Figure 1.
Publications 2018, 6, x 7 of 16 In a generic terms list there is an exception for the use of the compound term "web of data", which was considered necessary because of the syncronogenematic nature of the element "of data" [53] and its necessity for the meaning intended with the term in question.
The option for two categories, "main definition" and "complementary definition", was necessary since in some of the versions of the Semantic Web entry two or three definitions coexisted.In these cases, the analysis of their statements revealed two patterns: in one, an assignment of the definition to Berners-Lee (subcategory 2.1), and, on the other, a relation to the common usage of the term (subcategory 2.2).Units #01 and #02 (rf.GT (a)) were considered within the category 1 as the main definition, despite their close relationship with Berners-Lee, given that in these initial versions of the article they are the only definitions.The temporal distribution of the groupings, by generic term (see Table 2), is presented in Figure 1.The diachronic visualization presents an enlightening overview of the evolution of the semantic web concept in Wikipedia´s context.Given the extended time span (December 2001 to December 2017) it is natural that definitions with little longevity are less noticeable, as is the case with those referred to with (d), (e) and (j), whose duration is less than 10 days.
The analysis of the definitions revealed conceptual variations due to the introduction or alteration of the specific characteristics attributed to the generic term (see Table 3).The diachronic visualization presents an enlightening overview of the evolution of the semantic web concept in Wikipedia´s context.Given the extended time span (December 2001 to December 2017) it is natural that definitions with little longevity are less noticeable, as is the case with those referred to with (d), (e) and (j), whose duration is less than 10 days.
The analysis of the definitions revealed conceptual variations due to the introduction or alteration of the specific characteristics attributed to the generic term (see Table 3).In some cases, the conceptual drift only occurs in the qualifiers, as is the case in group (b) of Table 3, where a single generic term, "project", includes three variations: first the project is objectivized with the qualifier "current" (#03), then with the term "underway" (#04), and finally it loses its adjectivation (#05).
In an inverse situation are the supplements that serve as a link between the different generic terms, as occurs in groups (c) to (g) of Table 3.The variation between the five terms becomes gradual when framed by the specifiers that are maintained or little altered, such as pertaining to WWW membership in these groups.Another example is visible in the change from the term "evolution" (#08) to "framework" (#09 and #10), where the former becomes part of the specifying characteristics of the second, an "evolving framework".This specifier, "evolving", accompanies the following three terms: "set of initiatives" (#12), "extension" (#13), and "development" (#12).
The comparison between the definitions of the semantic web concept, identified in Wikipedia, with those resulting from the analysis of the same concept based on the publications of Berners-Lee and 3WC, was also carried out in a diachronic perspective.For the sake of clarity and representativeness, we have opted to restrict the analysis to variations with a duration of more than 90 days, and not to include the two complementary definitions of common use (subcategory 2.2), since they would only add "noise" to this comparison.Applying these criteria result in eight main definitions and two complementary definitions (Figure 2).In some cases, the conceptual drift only occurs in the qualifiers, as is the case in group (b) of Table 3, where a single generic term, "project", includes three variations: first the project is objectivized with the qualifier "current" (#03), then with the term "underway" (#04), and finally it loses its adjectivation (#05).
In an inverse situation are the supplements that serve as a link between the different generic terms, as occurs in groups (c) to (g) of Table 3.The variation between the five terms becomes gradual when framed by the specifiers that are maintained or little altered, such as pertaining to WWW membership in these groups.Another example is visible in the change from the term "evolution" (#08) to "framework" (#09 and #10), where the former becomes part of the specifying characteristics of the second, an "evolving framework".This specifier, "evolving", accompanies the following three terms: "set of initiatives" (#12), "extension" (#13), and "development" (#12).
The comparison between the definitions of the semantic web concept, identified in Wikipedia, with those resulting from the analysis of the same concept based on the publications of Berners-Lee and 3WC, was also carried out in a diachronic perspective.For the sake of clarity and representativeness, we have opted to restrict the analysis to variations with a duration of more than 90 days, and not to include the two complementary definitions of common use (subcategory 2.2), since they would only add "noise" to this comparison.Applying these criteria result in eight main definitions and two complementary definitions (Figure 2).From the observation of the temporal distribution, presented in Figure 2, two situations stand out, the first being related to the variations of the main definition with the generic term "vision" and "project", to coincide with the period in which publications with definitions that have terms like "logic", "understanding", "knowledge", or "meaning" (αω).The second situation concerns to the term "web of data", both in the main definition (in 2011) and in the complementary (in 2010), after this term is used explicitly (in 2009) in the analyzed Berners-Lee/W3C publications.
Another potential relation is to verify if we consider the descriptions present in the Berners-Lee and W3C publications previously analyzed.For this matter, we repeat in Table 4 the content units of the cited study [22].
Referring to the Table 4, we can note that the term "extension" is used to define the semantic web in two moments.From the observation of the temporal distribution, presented in Figure 2, two situations stand out, the first being related to the variations of the main definition with the generic term "vision" and "project", to coincide with the period in which publications with definitions that have terms like "logic", "understanding", "knowledge", or "meaning" (αω).The second situation concerns to the term "web of data", both in the main definition (in 2011) and in the complementary (in 2010), after this term is used explicitly (in 2009) in the analyzed Berners-Lee/W3C publications.
Another potential relation is to verify if we consider the descriptions present in the Berners-Lee and W3C publications previously analyzed.For this matter, we repeat in Table 4 the content units of the cited study [22].
Referring to the Table 4, we can note that the term "extension" is used to define the semantic web in two moments.Initially, it appears in two documents (of 2001 and 2002, subgroup 1.b.) very close to the beginning of the article in Wikipedia (December, 2002) and then (August and September, 2006; subgroups 3.b.and 2.b., respectively).The same term was used in the Wikipedia definitions in February, 2007 ("an evolving extension"), very close, though, to the second occurrence in the publications.
Unlike the definition of the semantic web as the "Web of Data", verified in the two sources, we did not find in the definitions of Wikipedia mentions that could be understood as the "Web of Linked Data", as it appears explicitly in two publications in Table 4, for 2006 and 2015 (sub-group 2.a.).

Discussion
The concept of the semantic web, presented in the respective entry of Wikipedia, shows an evolution that seems to oscillate between the search for a more concrete definition and the use of terms accessible to the common layman.Explanations regarding the need to adapt the vocabulary to the non-specialist user by the editors can be found in both the descriptions of the changes (available in the article history) 7 and in the discussion page.As an example, for the first case, "skewed the defn [sic] to an outsider's (web user's) point of view" (Vanished user kijsdion3i4jf, 23 February 2008); "Query users by better explaining 'to web of data that can be processed by machines'" (Quercus solaris, 13 October 2017) and, for the second, "WP content is intended for a 'general audience', the wording should reflect that" (dr.ef.tymac, 20 February 2007) 8 .
Although the evolution of this concept presents points of contact and similarity between the two scopes (Wikipedia and the publications of Berners-Lee and W3C), the differences detected go beyond that imposed by the type of support, continuum in the first (once it is continuously open since all contributions can be reversed at any time) and, in the second, composed by discrete units (which are closed to changes at the time of publication).The present study leads to the conclusion that the search for adaptation to non-specialist readers by Wikipedia editors marks a significant difference between the two scopes.The adaptation referred to above may also give rise to the need for additional definitions, since it is thus possible to present in an integrated form more than one point of view concerning the same concept.
The search for a clearer and more specific definition is, we believe, responsible for the elimination of dubious expressions or buzzwords 9 .In some changes made to the article, this attempt to promote clarification is explicitly stated, as in 21 November 2011, where the segment "that facilitates machines to understand the semantics, or meaning, of information on the World Wide Web" was taken from the definition and classified as "obscure" 10 .Also, in the change from the generic term "project" to "framework", as well as in the change from the latter to "extension", we can identify this double intention of clarification and adapting to non-specialist readers.This belief is reinforced by the debate around this last change (from "framework" to "extension"), shown in the respective discussion page, where it is possible to find, in the editors' debate, the search for the balance between the personal understandings of the given concepts and the adequacy to the general readers.The discussion we are referring to is not a unique example of negotiation processes for the terms to be used in the definitions, detected on the discussion page.On the other hand, there were no occurrences in the history of the Semantic Web entry, of the repeated and systematic alternation between versions, known as "edit wars" [54], as we can see in several entries of Wikipedia.
In fact, regarding the authorship of the changes to the definition presented in the Semantic Web article, they are characterized by debate and diversity.In the 26 definitions registered, there are 16 different users registered and four unregistered.In addition, users with more than one definition make their contribution in the same edition and with definitions that fall into different categories; one main and one attributed and/or common use (see Table A1 in Appendix A).The only exception, reported on 20 February 2007, occurred in the context of what could have originated an "edit war" between two editors (Dreftymac and Cygri).However, the debate was transferred to the appropriate channel, the discussion page, where the predominant position of the two editors was the negotiation of a consensus between the two different visions.A negotiation, where the 7 In: https://en.wikipedia.org/w/index.php?title=Semantic_Web&action=history.A buzzword is a word or expression that has become fashionable in a particular field and is being used a lot by the media, (https://www.collinsdictionary.com/dictionary/english/buzzword). 10 Retrieved from Semantic Web: Revision history, (dynamic URL).perception of the multiple meanings that the semantic web concept can take for different people is present: "we deal with a much-hyped term that is used to mean quite different things by different people" (Cygri, 21 February 2007) 11 .
Despite this, the last definition ("is an extension of the WWW") has remained stable for almost three years, in parallel with the definition attributed to Berners-Lee: "The term was coined by Tim Berners-Lee for web of data that can be processed by machines".The scope of this term, "extension", may contribute to the stability of the definition, but does not contribute to a specification of the concept that it intends to define.From this point of view, the semantic web concept can be seen as being in a "pseudo-concept" phase which, according to Vygotsky [55], is characterized by an intermediate stage between the general or complex notions and the fully developed concept.
Another issue that may create some kind of restraint in changing the definition is the link (academic and professional) of the author of the last definition to the semantic web.However, we are not giving to this influence too much weight because, in Wesch's words: "Authorized information is not beyond discussion on Wikipedia, information is authorized through discussion" [15].

Conclusions
Regarding the relationship between Wikipedia and academia, the study points to the target audience as a relevant difference factor.This feature of moderating the language used to reach a wider audience implies that Wikipedia, even without the NOR rule, should be taken as a complementary not an alternative medium for scientific dissemination.
Given the characteristics of Wikipedia, described and discussed throughout this paper, we can consider it as a place for collective bargaining of meanings, and it is therefore important to take it as an object of study for a community's understanding of a concept in particular.This position is aligned with Hjørland's quote: "Concepts have been understood as socially negotiated meanings that should be identified by studying discourses rather than by studying individual users or a priori principles" [21].In this context, this research presents an approach, for the diachronic study of these discourses using the information source and features provided by Wikipedia.Despite Wikipedia's relevance to this study of the collective construction of meanings, other similar studies would be necessary to understand the importance of this phenomenon in a more comprehensive process of "dictionaryization" in which the content of a concept is fixed by its definitions [38].It is possible, however, to draw a parallel between the conceptual evolutionary dynamics inherent in the workings of Wikipedia and Derqui's assertion, that says that: "a social system is organized around definitions and redefinitions" [56].

Figure 1 .
Figure 1.Temporal distribution of the definitions (group by the respective generic terms).

Figure 1 .
Figure 1.Temporal distribution of the definitions (group by the respective generic terms).

Figure 2 .
Figure 2. Comparative temporal distribution between the definitions of semantic web from the two sources (Wikipedia and publications of Berners-Lee/World Wide Web Consortium (W3C)).
Initially, it appears in two documents (of 2001 and 2002, subgroup 1.b.) very close to the beginning of the article in Wikipedia (December, 2002) and then (August and September, 2006; subgroups 3.b.and 2.b., respectively).The same term was used in the Wikipedia definitions in February, 2007 ("an evolving extension"), very close, though, to the second occurrence in the publications.

Figure 2 .
Figure 2. Comparative temporal distribution between the definitions of semantic web from the two sources (Wikipedia and publications of Berners-Lee/World Wide Web Consortium (W3C)).

Table 1 .
Selection of Wikipedia quality studies.

Table 1 .
Cont.While Wikipedia's massive reach in coverage means one is more likely to find a biography of a woman there than in Britannica, evidence of gender bias surfaces from a deeper analysis of those articles each reference work misses." [43]t Wikipedia articles representing the 10 most costly medical conditions in the United States contain many errors when checked against standard peer-reviewed sources."[43]2014Textbooks (information for 100 drugs)

Table 2 .
Generic terms and respective content units retrieved from the identified definitions.

Table 3 .
Specific characteristics of generic terms.

Table A1 .
Definitions extracted from the "Semantic Web: Revision history".It derives from W3C director Tim Berners-Lee's vision of the WWW as a universal medium for data, information, and knowledge exchange.is an evolving extension of the WWW in which Web content can not only be expressed in natural language, but also in a form that can be understood, interpreted and used by software agents,

Table A1 .
Cont.is an evolving development of the WWW in which web content can not only be expressed in natural language, but also in a form that can be understood, interpreted and used by software agents, It describes methods and technologies to allow machines to understand the meaning-or "semantics"-of information on the WWW.Tim Berners-Lee defined the Semantic Web as "a web of data that can be processed directly and indirectly by machines".is the roadmap of a "man-made woven web of data" that facilitates machines to understand the semantics, or meaning, of information on the WWW.