The Network of Early Modern Printers and Its Impact on the Evolution of Scientiﬁc Knowledge: Automatic Detection of Awareness Relationships

: This work describes a computational method for reconstructing clusters of social relationships among early modern printers and publishers, the most determinant agents for the process of transformation of scientiﬁc knowledge. The method is applied to a dataset retrieved from the Sphaera corpus, a collection of 359 editions of textbooks used at European universities and produced between the years 1472 and 1650. The method makes use of standard bibliographic data and ﬁngerprints; social relationships are deﬁned as “awareness relationships”. The historical background is constituted of the production and economic practices of early modern printers and publishers in the academic book market. The work concludes with empirically validating historical case studies, their historical interpretation, and suggestions for further improvements by utilizing machine learning technologies.


Introduction
In the project The Sphere: Knowledge System Evolution and the Shared Scientific Identity of Europe (Sphere Project 2022) we investigate the processes of transformation, homogenization, and mathematization of scientific knowledge during the early modern period. We consider the homogenization of scientific knowledge in the early modern period to be a determinant, identity-shaping factor of European culture.
The homogenization of knowledge is only one possible result of knowledge transformation processes. When a student enrolls at a university nowadays to study mathematics, it does not matter whether the student enrolls in Paris, New York, or Seoul: that student will be introduced to the same knowledge, possibly even organized in the same way. We consider this the result of a historical process that can be traced back to its roots and, as concerns Western culture, to early modern Europe, when the overall connectivity of learned society was greatly increasing (Hotson and Wallnig 2019;van den Heuvel 2015;Vugt 2017). The present work shows how this epistemological process was deeply interwoven with the economic and commercial constrains bound to the production and distribution of textbooks. Hence, social networks among early modern printers and publishers have become relevant, which instigates the search for a method of disclosing such networks systematically.
To investigate these processes, we curated a specific electronic corpus of historical sources-namely, astronomy and cosmology textbooks used in universities across Europe between the end of the 15th century and the mid-17th century. These early modern printed books all contain, in different forms, a specific treatise on cosmology: Johannes de Sacrobosco's (d. ca. 1256) Tractatus de sphaera. First compiled at the University of Paris in the 13th century, Sacrobosco's textbook was used to teach a qualitative introduction to geocentric cosmology. As a modern text of reference we use Lynn Thorndike's 1949 critical edition (Thorndike 1949). The corpus also includes 127 university-level astronomy and cosmology textbooks that, while they do not contain Sacrobosco's treatise, include introductions to spherical astronomy that follow the same design as Sacrobosco's work, discuss identical subjects in the same order, and reference matching visual elements-at least in part. We denominated these textbooks "adaptions". In total, we identified 359 different printed editions dating from the years 1472 to 1650. This is our "Sphaera corpus".
While a general overview of the corpus-especially concerning the printing locations of each edition, book formats, authors, printers, and publishers, as well as the languages of the textbooks-has already been accomplished (Valleriani 2020b), it is here relevant to point out, first, that although the text was printed until the 18th century, we decided to cap our research at 1650 because of the dramatic decrease in relevance of the text as a scientific treatise at the universities thereafter and, second, that this corpus is particularly relevant because Sacrobosco's De sphaera and Georg von Peurbach's (1423Peurbach's ( -1461 Theoricae novae planetarum ) became the first mathematical texts ever printed in 1472. Sacrobosco's treatise appeared twice in the same year at a distance of a few months from each other in two different locations.
The treatise experienced enormous popularity in quadrivial teaching and in many other institutional and educational contexts, thereby also functioning as an agent of the codification of practical knowledge (Hamel 2004(Hamel , 2006(Hamel , 2014Johnson 1953;Valleriani 2017). For this reason and because of the long timespan during which it was used, we consider this corpus-and the results obtained from its analysis-to be representative of the process of the transformation and homogenization of scientific knowledge in the early modern period. This is reflected in its study at the universities, where it became background knowledge common to students all over the continent, without which the processes of reception or rejection of new ideas (such as the Copernican worldview) cannot be fully grasped by historians.
To reconstruct the evolution of knowledge transformation in European universities, we decided to extract a series of data from the Sphaera corpus that we considered representative of the scientific content included in the textbooks. We call these data "knowledge atoms" in reference to the atomization of texts in commentary: atomization was a standard procedure used to create new scientific knowledge from antiquity until the end of the early modern period (Grafton 2013).

Text-Parts as Knowledge Atoms
After the creation of a semantic database applicable to any historical analysis of early modern corpora (CorpusTracer) (Kräutli et al. 2021;Kräutli and Valleriani 2018;Valleriani and Kräutli 2022), our analysis of knowledge atoms began with texts (text-parts) but will later include illustrations and astronomic computational tables . Using electronic copies of each source, the texts were carefully atomized into text-parts, which are defined as textual passages not formally shorter than a paragraph that cover a well-defined subject with completeness. A text-part in the Sphaera corpus, for instance, is the Theoricae novae planetarum by Peurbach (Malpangotto 2021). This text was first included in the Sphaera treatises as early as 1482, and by 1537 it had been reprinted 11 times in different editions, and another 22 times as a reference text on which other scholars commented. If literary compositions-ordinarily printed in scientific books beginning in the 16th century-are considered, a text-part may be much more modest in length. A representative example might be the short Carmen written by Donato Villalta (1510Villalta ( -1560 and dedicated to the scholar Pierio Valeriano (1477Valeriano ( -1560, which was printed for the first time in 1537 and reprinted over 34 times thereafter (Valeriano 2022;Villalta 2022). An example of a text-part that can be seen as both a literary composition and a scientific contribution is the famous letter from Philipp Melanchthon (1497Melanchthon ( -1560 to Simon Grynaeus (1493-1541) in defense of astrology and in favor of the study of astronomy as a teaching subject in the Reformed countries. The letter was included in a Sphaera textbook for the first time in 1531. It was printed another 64 times in its original form and 14 times as adapted by either Guillaume Des Bordes  or Martin de Perer Bearnois (1520-1570 (Lalla 2003;Melanchthon 2022aMelanchthon , 2022bPantin 1987;Reich and Knobloch 2004).
This textual dissection of the editions in question was helpful because many text-parts recur; we were therefore able to create a diachronic network whose nodes are the textparts. The goal was to analyze how the textbooks evolved by tracking the recurrences of the text-parts. To date, the corpus in its entirety contains 540 text-parts, which we have identified chronologically by publication date. Moreover, we considered recurrences only of text-parts that were published at least twice, with the second publication released at least one year after the first. In observance of these criteria, 241 text-parts remain, meaning 299 text-parts were published either only once or more than once but in the same year. The remaining 241 text-parts recur 1140 times in the 178-year timespan considered here.
Our primary focus is in knowledge-economy studies, meaning that we investigate epistemic processes such as the homogenization of knowledge by relating them to modes of knowledge production. For that reason, we also created a synchronic network in which text-parts are connected to one another by means of semantic relationships. The taxonomy of text-parts and of the semantic relationships used to connect them is as follows: "original text" and "text of reference", "commentary of", "translation of", and "fragment of". All possible combinations of these relationships are in place, and every text-part is identified by being in one or more pairwise semantic relationship with another.
For instance, the quadrivial professor Konrad Tockler (1470-1530, who taught at the University of Leipzig at the beginning of the 16th century, authored a commentary on De sphaera. This commentary is a text-part in itself. It was published in 1503 and again in 1509 (Kremer 2022;Sacrobosco et al. 1503Sacrobosco et al. , 1509Valleriani and Citron 2020). The first time, the commentary was published together with another text-part-a fragment of Thābit ibn Qurra's (826-901) De imaginatione spere et circulorum ). The second time, however, it was published with two additional text-parts: Thābit's fragment, Tockler's commentary, and finally his description on how to build an armillary sphere (Ordinatio Spere materialis: et decem circulis: huic operi inserviens: per Magistrum Conradum Noricum noviter addita et diligenter revisa) (Tockler 2022). Our ontology (Kräutli and Valleriani 2018) allows us to connect not only the two aforementioned editions on the basis of the fact that they share a text-part-the same commentary-but also all editions that contain a commentary on the same reference text. Furthermore, we can connect all editions that contain one of the other text-parts that were printed together with the commentary at hand or with their translations, etc. Finally, we merged the diachronic and synchronic networks to study the modes of production of text-parts as components of editions and in their temporal evolution.
Another example that displays the power of our analysis from a philological point of view is the reconstruction of sources that informed, and of sources that were influenced by, Élie Vinet's (1509Vinet's ( -1587 commentaries on Sacrobosco's De sphaera. This great French reformer of the mathematical disciplines (Desgraves 1977) authored five successive commentaries on De sphaera during his life (1551, 1556, 1558, 1569, and 1584), all published for the first time in Parisian editions (Vinet 2022a(Vinet 2022b(Vinet , 2022c(Vinet , 2022d. All these commentaries were influenced by different textual sources and became influential themselves in different regions. To organize and interpret the multitude of possibilities and the high dimensionality of the collected dataset, we created a multilayer network that could show us, in graph form, all possible connections along the temporal axis. This network provided us with a formal system that we analyzed by methods typically used in complex systems studies (Figure 1). Figure 1. The complete multi-layer network representing how editions of the corpus are connected to one another according to six different parameters based on the taxonomy and the semantic relationships of the text-parts, as well as their recurrences over time (Red: Nodes, Blue: Edges). From (Zamani et al. 2020). Visualization realized with muxViz (De Domenico 2022;De Domenico et al. 2015).
Based solely on the dynamics of the text-parts, we were able to identify editions that became dominant all over Europe, meaning their content was "borrowed" by other printers and publishers, thus greatly contributing to the process of the homogenization of knowledge. Many of the popular editions were printed in the newly Reformed city of Wittenberg, which demonstrates how the Reformation influenced scientific and pedagogic content across the continent (Valleriani et al. 2019). Moreover, we were also able to show that a relatively small number of editions printed and published in Wittenberg during the short period between 1549 and 1562 formed a knowledge bridge. These "great transmitters" combined old and traditional texts with innovations that had emerged in the previous two decades. Such transmitters became paradigms for cosmology textbooks that endured long after the turn of the century. Finally, we were able to identify what we called "enduring innovations", innovative textbooks (also produced in Wittenberg) that adapted traditional schemes and became very influential for a long period of time (Zamani et al. 2020). What counts here is that for many years the process of the homogenization of knowledge was mostly based on the imitation of a central emanating model, represented by consecutive editions of textbooks produced in Wittenberg.
The method we present in the following, aimed at discovering social networks among printers and publishers, considers the editions of the corpus and disregards the deeper level represented by the semantic connections based on text-parts. We mention the results achieved to this point, first, because these results prompted the research questions that finally led to the development of the method described in this work, and second, because they are necessary for the validation of our method.
In fact, once the fundamental semantic and temporal dynamics of the homogenization process came to light, the necessity arose to understand the relationships between those dynamics and the social, economic, and institutional transformations that created the conditions for the production, distribution, and use of such textbooks. We needed to understand how and why the mechanism of imitation took place. First we examined the world of the authors of the commentaries.
An international working group closely scrutinized 43% of the textbooks (https://sp haera.mpiwg-berlin.mpg.de/doi-visualisation-authors-volume, accessed on 1 November 2022) constituting the corpus (Valleriani 2020a) and was indeed able to discover not only that all commentators actively contributed to the spread of this knowledge through their function as teachers and lecturers in the quadrivial disciplines, but also that many of them did so by acting as a bridge between educational institutions-such as gymnasia and universities-and the printers' workshops or the publishers. However, we were also able to show that the social network among commentators could not on its own explain the network of text-parts connected through their semantic relationships and, over time, their recurrences. In short, the social network among the scholars was just too small to explain the swift, continental process of homogenization that we had revealed (Valleriani 2020b). We therefore chose to shift our focus to the printers and publishers with the aim of analyzing their social networks and understanding whether specific constraints and relationships among them could explain the imitation phenomenon.

Printing and Business
Until recently, early modern printers and publishers-the book producers-involved in the academic book market have been almost completely ignored in the research of bibliographers and historians of science. The challenge emerged to determine how to collect data on their mutual relationships, with the aspiration of covering the corpus as systematically as possible. To start, their business model had to be understood-in particular, that academic textbooks, which belonged to the quadrivial disciplines, were first and foremost produced with an eye on the local market (Gehl 2013), that being the educational institutions in the region surrounding a printer's workshop or a publisher's insignia. What required further investigation were, first, the mechanisms that allowed a specific edition of a quadrivial textbook to enter the transregional market and, second, the aspects of production and distribution that led printers and publishers toward mutual imitation.
For this purpose, a different international working group completed a series of encompassing studies that constitute the background of the developments and results presented in this work. A series of studies that deepened the perspectives of the printers, publishers, patrons, commentators, educational institutions, and additional institutions (such as book fairs) (Valleriani and Ottone 2022b) allowed us to formulate what we defined as the business model of early modern printers and publishers in the academic book market (Valleriani and Ottone 2022a). An analysis of around half of the sources (https://sphaera.mpiwg-berlin.mpg.de/sphaera-printers-volume, accessed on 1 November 2022) showed that a social network sufficient to encompass the various mutual interactions among early modern printers and publishers could explain the epistemic processes described above. What was missing was a method of creating such a network through a systematic approach that considered all available sources and not specific to the Sphaera corpus alone. The present work uses the Sphaera corpus as an example and proof in order to show how to create such a network, one in which printers and publishers are connected to each other on the basis of what we call "awareness relationships".
Because of the lack of systematic prosopographical and biographic data regarding early modern printers and publishers in the academic book market, we focused on the material at our disposal: the (electronic) printed editions collected in the Sphaera corpus. The combination of our source analysis with our knowledge concerning the business models and economic practices of early modern book producers created the opportunity to build such a network and to develop a new method of generating it automatically. This method can be fully replicated as the datasets of all stages of analysis, as well as all the codes mentioned below and the network files are available. In the following, the economic and production practices of early modern book producers will be discussed first.

Sequential New Editions
A quick look at printers' and publishers' production data reveals that many of them tended to put the same textbook on the market many times. In these cases, the second and consecutive editions are denominated as either "reprints" or "reissues" of the first, depending on a printer's strategy. Examples of such printers include the Wittenberg printer Johann Krafft the Elder (1510-1578) (Krafft the Elder 2022) and the Frankfurt-based printer Peter Braubach (1500-1567) (Braubach 2022).
Obviously, the reasons for such frequent releases are to be found in the nature of the academic book market. Then as now, new students had to be provided with new textbooks on a nearly yearly basis. Frequent distribution of new editions was also a means to decrease the financial consequences of the secondhand market (Nuovo 2013).
There were several methods of maintaining such a pace. Because of the printing technology, early modern printers tended to produce print runs that were larger than what was actually needed according to their sell-through rate (Werner 2019, pp. 8-23). A great deal of the skilled work required to produce a book was related to the composition of the forme and therefore to the translation from manuscript to printed page with a specific layout: length and number of lines, positions of headings and illustrations, and special characters are only a few aspects that the compositors had to manage when composing a forme for each sheet. Once the forme was ready, the routine work at the press could reach impressive peaks of efficiency, and several thousand sheets were easily printed in only one day. The production of an edition was complete when all the sheets for each individual copy in the print run were fully printed. The production costs for each unit, therefore, diminished in proportion to the increase in the size of the print run. If a printer were able to sell every copy of a print run at a fixed price, then the larger the print run the higher the profit gained from each. Textbook printers, therefore, always had at least two choices when they put a new edition of a previously published textbook on the market: either they produced a new forme for every edition (reprint) or-because of the repetitive nature of the textbooks and the strengths and weaknesses of early modern printing technology-they printed particularly large print runs and put only portions on the market for each edition, while storing the other copies for the years to come (reissue). In this way, several reissues could be realized by putting units on the market that were printed years earlier and simply producing a new title page and perhaps a new final page containing the colophon with an updated publication year (Maclean 2009(Maclean , 2022. Printers also had a third choice. As mentioned above, such textbooks were compositions of different texts that we call text-parts. This characteristic offered the opportunity to really realize new editions with a minimal effort by introducing and/or eliminating one or more text-parts. In this way, large portions of past print runs could be re-utilized and a new edition realized with minimum investment. Technically, this mechanism was made possible by the fact that books were not sold as bound books like they are today. They were sold as a series of unbound quarries of sheets. It was the customer who decided how to bind the quarries and eventually which additional quarries (and therefore which additional books) were to be bound together with the first.
Another important motivation for printers and publishers toward such "hybrid" reissues is related to the practice of "privileges". For the most part, privileges were initially granted to printers and publishers to protect their product. They therefore became fundamental instruments for planning production. Privileges, however, were granted only for a limited number of years-in Paris during the second half of the 16th century, for instance, usually for only 2. To obtain a new privilege for a new edition, the printer had to display a work that contained some novelties (or at least differences) compared with the previous edition. Introducing and/or eliminating a new text was one tactic a printer could exploit in order both to re-utilize stored printed pages and to keep obtaining privileges for each new edition on the market (Nuovo 2013, pp. 195-260).
With this in mind, it is relevant to note that in all three cases, a sequential new edition would appear similar or almost identical to its predecessor in both its content and its overall layout-its format and the way the book appears to the eyes of the reader. It is this characteristic, together with the financial and production practices described in the next section, that allowed us to conceive of a method to automatically build a historically meaningful social network among early modern printers.

Copying and Tauschhandel
Due to the tendency to produce large print runs at great cost, it was common for a printer's workshop to come into financial difficulties or even to go bankrupt. One way to hinder such fatal crises was for a printer to enlarge, as much as possible, their own catalogue of printed titles and to enter as many knowledge fields as possible. In this way, many portions of literate society could be reached for potential customers and the probability of experiencing intervals without income decreased. For this reason and on the basis of the practice of reissuing, printers and publishers were often connected with each other according to the practice of the Tauschhandel (barter) (Maclean 2021, pp. 50-51, 247-78). In practice, two printers would exchange portions of two respective print runs so that both of them, after having printed a new title page (and eventually a new colophon page) could list the edition in their catalogs and sell it. Especially in reference to textbooks, this practice could have reached two discrete local markets without placing the two printers in direct competition. Due to the possibility of increasing the margin for each unit by increasing the size of the print run, both printers could help each other in a truly win-win situation. A nice example of Tauschhandel among printers are the two 1582 editions produced in Antwerp and sold by the brothers Pierre and Jean Bellère (Sacrobosco et al. 1582a(Sacrobosco et al. , 1582b. A close analysis of two exemplars sold by the two brothers respectively clearly reveals that they belong to the same print run. One of the two brothers (unfortunately, we do not know which) almost surely produced all copies and gave a certain number of them to his brother to sell, most likely elsewhere. Clearly, the brothers' relationship was not only familial but also commercial.
Another way to decrease the probability of incurring financial difficulties was to lower the investment capital required for each new first edition or for a new text-part to be introduced into copies of an extant edition. As mentioned, the composition of the forme was a time-consuming task, executed by a skilled (expensive) laborer. A simple way to speed up this work was to (re-)produce a new edition or a new text-part by relying on a previously printed book instead of a manuscript. In this way, the compositor only had to copy the text and all layout aspects one-to-one. No creative work was needed, and the work could be accomplished quickly. Imitation in this case meant reprinting, but at a much lower cost. True, if a book was protected by a privilege and the imitating printer's workshop was located in the same juridical region, this practice could lead to legal difficulties. However, again, the repetitiveness of the textbooks along with the wider process of the homogenization of scientific knowledge created the background against which plenty of opportunities emerged to imitate other printers' editions without difficulty.
As mentioned above, for instance, our analysis of the content of the editions alone has shown that editions printed in Wittenberg beginning in the 1530s were particularly imitated all over Europe. In this case, privileges granted in the Reformed countries were simply not valid in the Catholic ones, a situation that enormously facilitated the satisfaction of the Catholic hunger for Reformed science. A demonstration of this effect can be clearly observed in the diffusion of the octavo as a format for such textbooks. A similar formatwas a precondition for imitating the layout and content of another book. After the introduction of the octavo book format in Wittenberg in 1531-by the printer Joseph Klug (1490-1552) (Sacrobosco and Melanchthon 1531)-the use of this format in the academic book market spread at lightning speed across Europe (Pantin 2020;Valleriani et al. 2019).
Reissuing, hybrid reissuing, Tauschhandel, and imitating were all practices that created an abundance of editions of textbooks all over Europe-editions that all look very similar, but are never identical. If we had no title page and colophon, we would often not be able to discern among them. What they all have in common is that they presuppose a previous edition, or at least the availability of a previous edition, in order to be produced. What they do not have in common is that in the case of imitation and Tauschhandel, two sequential or parallel editions involve two different printers and/or publishers, whereas reissuing always concerns one individual book producer at a time. It is this fundamental difference that opens the doors to the possibility of automatically detecting relationships among book producers involved in the academic book market during the early modern period.

Awareness and Its Formal Expression
If we consider two exemplars of two different editions and can state either that one is an imitation of the other or that both were printed in the same print run, and if both editions are claimed to be printed by two different printers or financed by two different publishers, then we can state that a relationship of "awareness" was in place between the two book producers. Awareness only means that one was aware of the existence of the other or, more precisely, of the existence of the other's edition. The awareness relationship can, but does not have to, subsume a real economic relationship, such as the practice of Tauschhandel. The relationship itself, moreover, does not specify how the second book producer became aware of the edition of the first. Be this because exemplars were displayed at the book fairs, or because of a traveling pupil or scholar, or for whatever other reason, the concept of awareness represents an abstract relationship that can only be described empirically by means of further historical sources (if these are available) in reference to each individual pair of editions.
Such an abstract way of working with relationships among printers and publishers is justified by the necessity to achieve a systematic set of relational data with the aim to cover all the editions that constitute the corpus and, as a second step, early modern books in general. As will be shown later from the perspective of the historical interpretative framework, the whole set of awareness relationships represents clusters of book producers who executed mutual generic influence on one another; such clusters can be localized temporally and spatially with great precision by making use of dates of birth and death, the time periods during which the book producers were actually active, and finally the publication years and the places of publication of the editions under consideration.
Before moving to the method used to shape and build such clusters, however, it first needs to be clarified how two exemplars of two different editions can be formally defined as similar, which is to say either belonging to the same print run or one as imitation of the other. For this purpose, we make use of the fingerprint. In the words of Owen Massey McKnight (https://users.ox.ac.uk/~bodl0842/fingerprints, accessed on 1 November 2022), a librarian at Jesus College in Oxford, "a fingerprint is a sequence of characters derived from the text of an early printed book. A fingerprint can be used to detect variant settings of type in otherwise matching editions and to identify the reuse of the same setting in ostensibly different editions. It can also act as an identifier for any printed work, assisting identification of partial texts. Loosely, the fingerprint is an «ISBN for older books»".
As an example, the fingerprint of the edition produced by Klug in Wittenberg in 1531 is: The fingerprint does not directly refer to an edition or its entire print run but to each single copy. Obviously, assuming that two copies are identical, then they would have exactly the same fingerprint. However, this is very rarely the case. In fact, the fingerprint is created by following specific rules that dictate where to extract the single characters that constitute the sequence. The fingerprint is divided into seven parts. The first four parts are alphanumerical, each of four positions-in the example above, r-t., s,Ch, r.lu, and caar. Take, for example, the first part, r-t. This is built by going to the first recto page after the title page and then noting the last two symbols of the last line (from left to right) and then the last two symbols of the line above it (also from left to right) ( Figure 2).
The other sets are compiled by making use of the same rule but on different pages. A series of rules are in place to describe which pages and especially how to deal with a myriad of possible exceptions, such as the absence of specific pages, the use of text in columns, the presence of tables and lists, and many other contexts. The fingerprints of all 359 editions constituting the corpus were generated by making use of a manual of rules adapted from the one followed by the Censimento nazionale delle edizioni italiane del sedicesimo secolo (EDIT16 2022), as an adaptation-now a freely available manual-was required to cope with the further degree of noise due to the heterogeneity of the scanning processes followed by archives and libraries over the last 20 years (Beyer 2019). On the basis of the operative rules to generate fingerprints, it is therefore clear that, when the fingerprints of two copies are similar, the latter resemble each other in both content and layout. What remains in need is a definition of fingerprints' similarity that mirrors a real similarity between two copies as sketched above, as displayed below in Figure 3. For this step, however, further considerations related to the material historical sources are required. Figure 3. Six editions similar in content and layout were identified by considering their fingerprints and by applying the first metric. One page-page 28-with a particularly rich layout, was selected to paradigmatically display the results. From (Sacrobosco and Clavius 1585, 1591, 1603, 1618. Courtesy of the Library of the Max Planck Institute for the History of Science. Two copies can be similar but for many reasons are almost never identical. Within the same print run, procedures such as the inking of the forme or the manual insertion of the sheets (as well as other unexpected complications, such as the necessity to execute corrections in the text or in the forme while printing, possible deformations or breaks of the same during the press cycle, the changing quality of the paper, and many others) invariably led to differences among copies upon close inspection. In cases when a new text-part was introduced or eliminated, two fingerprints can be similar only until the page before such a change was applied. In the cases of imitation, further differences were inevitably in place. Therefore, although fingerprints express both form and content, assessing a relationship of awareness by making use of fingerprints requires a specific metric that allows us to compare them and, at the same time, to account for differences (with a degree of variability) due to the reasons just mentioned-a step hitherto never attempted, though bibliographers have been using fingerprints for many decades.

Metrics of Similarity among Fingerprints and Validation
We conceived two metrics in order to be able to empirically compare them. First metric and its results. First, we decided to compare fingerprints by defining a relationship of similarity when at least one part of the first four parts of the fingerprint is repeated in both fingerprints. Each of the four parts is valid. In this way we could achieve lists of similar fingerprints where, for example, two consecutive fingerprints look like these: BookID 1925: aqta m.um ece-leli (C) 1490 (R) BookID 1926: aqta m.n-ece-asua (C) 1491 (R) respectively from (Sacrobosco et al. 1490) and (Sacrobosco et al. 1491) and where 'BookID' refers to the entry of the database of The Sphere project (Sphere Project 2022).
In this example, the first and the third parts are identical, the second is similar, and the fourth is different. Sometimes symbols on a printed book can turn up illegible or, in other cases, they do not belong to the list of special characters that generate fingerprints according to the rules. In these cases, a fingerprint part would contain at least one position filled with "*". Moreover, if pages or lines that are relevant to the generation of a fingerprint are missing or damaged, the corresponding fingerprint position is filled with "+". We relaxed the rule by including the possibility of defining two parts of two fingerprints as identical when all positions are expressed by the same symbol except for + and *, which can be identical to any symbol. Thus, according to our rule, the four parts of the following two fingerprints are identical, even though the second part of the fingerprint does not look identical: BookID 2105: t.s? **eu s.am inqu (3)  respectively from Beyer 1560a, 1560b).
In this way we obtained 56 chains of editions, while the number of editions involved in all chains taken together is 301. "Chains of editions" are chronologically ordered sequences of editions whose fingerprints share at least one set of fingerprints according to the specifications above. The algorithm that creates the chains follows a two-step process. First, the fingerprint metric between each temporal possible combination of books is calculated as a distance vector with entries for each fingerprint part. As an example, consider again the two fingerprints for BookIDs 1925 and 1926 above. The fingerprint distance metric for these two editions yields the vector (0,2,0,4), meaning two parts are identical, one part has two symbols at the same positions, and the last part is completely different. A directed link is finally created if the similarity condition for the metric is fulfilled (i.e., the distance vector contains at least one 0).
Next, a directed graph is created out of the set of all edges. In this graph, possible chains are simple directed paths (i.e., paths following the directed edges and never repeating a node) from nodes without incoming edges (roots) to nodes without outgoing edges (leaves).
We first grouped all such paths by checking whether paths share sets of nodes. All paths with shared nodes were manually joined for empirical validation. This step was concluded with an empirical validation, namely a validation that served both to clean the chains and to eliminate those sequences or single relationships of pairs of editions that, on the basis of a close inspection, were revealed not to be similar to each other in respect to form and content. The final result still consists of 56 chains, but involves 228 editions. The empirical proof, therefore, showed that almost 76% of the relationships were correct and that we can therefore use fingerprints to investigate trajectories of awareness among printers and publishers in the early modern period, as the following example clearly shows.
Chain No. 7 is constituted of 18 editions. Through the library of the Max Planck Institute for the History of Science in Berlin we were granted direct access to six of them (Sacrobosco and Clavius 1585, 1591, 1603, 1618. We selected one page in the oldest edition (1585) and we searched for the same in the next five textbooks. The page was selected because its layout is particularly rich: it displays different fonts for different texts (original text and commentary), headings, printed marginalia, and one scientific illustration, possibly obtained by using two different woodblocks placed one close to the other (Figure 3).
The last of the 6 editions was published 33 years after the first. The place of publication moves back and forth between Rome and Venice, even reaching Lyon. This selection showed the power of our method; it turned out that these were all editions produced by different printers and publishers but of the same work, namely the commentary on Sacrobosco's Sphaera by the famous Jesuit mathematician Christophorus Clavius (1538-1612) (Sacrobosco et al. 1584).
Second metric and its results. The second metric is less relaxed. It includes all the conditions of the first and, in addition, requires that the two fingerprints share at least two further symbols, in whichever part but at the same position. If expressed with the distance vector mentioned above, the condition requires one identical part (distance 0) and another part with at most a distance of 2 (i.e., only two symbols have to be changed to get an identical fingerprint part). This resulted in a list of 47 chains, including 178 editions. The list of chains was empirically validated following the same procedure, and the results show 41 remaining chains that involve 169 editions. The less relaxed metric therefore yields 94% correct pairwise relationships among the editions.

Metric Comparison and Choice
The first aspect that needs to be highlighted is concerned with empirical validation. Pairwise, the editions have been compared in reference to both their text-parts and their layout by checking them page by page in reference to the those text-parts that they share. In the two cases, the manual work required to obtain the definitive validated chains was very different. In the first case, the chains had to be both divided and integrated, and some chains (or portions of them) were eliminated. The resulting 56 chains, though equal in number to the list created from running the code, are not necessarily the same chains. The fact that the final number of chains did not change is due to randomness. In reference to the second, less relaxed metric, no splitting was necessary. The result had solved this problem already. However, certain chains, present in both lists, were shorter in the second (i.e., they contained fewer editions than their equivalents generated from the application of the first metric). We can therefore conclude with certainty that the second metric overlooks true relationships.
The second aspect relates to the entire corpus. As mentioned, this is constituted of 359 editions. After validation, the application of the first metric allows us to consider 228 editions, about 63% of the entire corpus. With the application of the second metric, this value decreases to roughly 47%. This means that the second metric overlooks about 16% of editions involved in true relationships. As these are empirically validated data and given our final historical goal to reconstruct clusters of awareness among book producers, we opt for the first metric. A method to further automate the cleaning is suggested in the last section of this work. The data concerned with these chains before and after validation are also available.

Clusters of Awareness Relationships
Chains of editions are chronologically ordered sequences. At first sight, they therefore seem to suggest that in each chain the second edition is influenced by the first, the third by the second, and so on. However, within the historical interpretative framework, this conclusion cannot be stated on the basis of the chains of editions. As mentioned, only a dual analysis of the sources and additional relevant historical sources could lead to such a deep and certain conclusion. Nothing at this stage precludes that the third edition of a chain, for instance, was influenced by the first as opposed to or alongside the second. No assumption in this respect is permitted.
To overcome this problem, we first created a graph of editions that contains all chains and in which each edition A of a chain is connected to all temporally following editions B of the same chain (assumption of maximal connectivity). At this stage, however, additional conditions are added. As our goal is to determine networks of printers and publishers, we obviously had to consider only editions published while their producers were alive, which means that we made use of additional metadata: dates of birth and death as well as periods of activity, pointing out that a different case study on the circulation of paratexts has shown that adding 10 years backward and forward to the dates beginning and ending activity does not significantly change the results (Valleriani and Sander 2022). Therefore, two editions are connected in our graph if the printers or publishers of edition A were still alive when B was published and B was published by different printers or publishers. These two additional conditions alone significantly decrease the number of editions involved and the size of the network. At this stage, the nodes of the graphs are still the editions. Their number has now decreased to 145. The number of edges is 512. In total, the network is constituted by 31 components, which means that 25 chains are now excluded (Figure 4). The assumption of maximal connectivity is applied. The network is generated by applying the additional conditions that the book producers of two different editions connected in the network must be two different agents and must have been alive at the times of both publications.
As for the connections, our method gives rise to edges based on nine different combinations between printers, publishers, and "printerpublishers," the last being those individuals who covered both roles. While seven kinds of these connections constitute about 10% each of all edges, the connection from printer to publisher covers only 3% of all connections, whereas the connection from printerpublishers to printerpublishers covers 30% of the edges. The edge parameter, therefore, captures the reason for a connection based on the individual's role as well as the basic fingerprint chain leading to the connection ( Figure 5). The edge parameters and the links from any previous editions to all other successive editions in the same chain (assumption of maximal connectivity) imply that, while moving from the empirically validated chains of editions to the graph created on their basis, we also move from the concept of awareness relationship to one of potential awareness relationship. We do not exclude any true relationships, but, despite the reduction of the dataset, we might still include relationships that could be ruled out by further empirical analysis. How far this step holds from an epistemological and (especially) an empirical point of view will be convincingly shown in the following.
As in the case of the choice of the metric, we conceived a system that, first, allows us to be certain that no true relationship is excluded and, second, is susceptible to reduction in a second step. Such a reduction is indeed intrinsic to the nature of our research question: the social network of printers and publishers connected by an awareness relationship. One effect of the code for the creation of chains of editions and of the manual cleaning is that one edition cannot be a member of more than one chain; the opposite would lead to the conclusion that one edition has been influential onto two dissimilar editions. In fact, the heterogeneity of historical sources, which historians are always faced with, does not exclude this possibility in absolute terms. In particular, variations due to different preservation statuses could lead to such a violation of the transitive function. Our goal at this stage, however, was not to exclude this case but to contemplate it in the creation of the general graph of maximal connection just described. Once the graph is created and all possible connections are contemplated, it can indeed be very easily translated into a social graph; it gets reduced automatically in the same step.
Whereas editions can only appear once in the entire list of chains, this is in fact not the case for the book producers. Printers and publishers are often mentioned in more than one chain, especially those who remained active in the academic book market for a considerable period of time. After a series of reissues, they often decided on a reprint, namely an edition that looked different from the previous ones. We postulate that this was most likely due to the dynamics of the market. The production of a reprint was then the occasion to take advantage of the methods described above in order to minimize the risks of capital exposure and perhaps to imitate another book producer. Every time this phenomenon took place, the printer or publisher in question appears in a different chain of editions. This ultimately means not only that the number of book producers is significantly smaller than the number of editions involved in the chains, but also that, at the level of social relationships, the chains are indeed connected to one another. In total, we obtain a list of 95 printers and publishers. On the basis of the structure of the data for the previous graph (Figure 4), we create a new graph in which the nodes are the book producers instead of the printed editions. The resulting network is now constituted of 17 components, which replace the original chains thanks to the translation into social relationships ( Figure 6). We kept only one edge between two nodes (book producers) independent of the number of relationships among their respective editions. In this way, the overall graph contains 201 edges expressing potential relationships of awareness. It is conceivable to assign a weight to the relationships in function of the number of editions that connect pairs of book producers, but given the potentiality of the awareness relationship. We rejected this possibility to avoid introducing an undemonstrated bias into the dataset.
We think that the shift of focus from the editions to their producers minimizes the redundancy created by the maximal-connection assumption used to create the graph of similar editions. Book producers were active for a certain period of time, and it can easily be the case that their work was influenced by more than one edition published somewhere else and by different book producers. In addition, social connections often express themselves in triads instead of pairs. This allows for a spread of knowledge that, in our opinion, can best be described with a more holistic approach. This assumption is in fact based on a micro case study in which, by means of a similar approach, the network of printers and publishers was reconstructed according to the fact that they published the same paratexts, for instance the same dedication letter to the same patron (Valleriani and Sander 2022).
It is important to note again that the relationships among editions, from which the social graph was built, always respect real temporal dimensions: no edition is connected to another if one of the book producers involved was no longer alive at the time of publication. In cases when a very short period of activity is the only available information concerning the biography of a book producer under consideration, we added 10 years both before and after the first and last years of publication of their known printed works as made available by the CERL Thesaurus (CERL Thesaurus 2022).

Validation of Social Clusters
In the following, we discuss most of the components of the graph of Figure 6, where a component represents a cluster of book producers who were socially connected to one another on the basis of relationships of awareness as based on the practice of imitation (Fangerau 2013;Feely and Hiinks 2015). Our intention was to use the previously achieved results, mentioned in Section 1, to validate the method, but in fact new relationships were discovered-an aspect that in fact reinforces the validity of our approach and encourages further use, as discussed in the last section.

Six Small Clusters
We first discuss six small clusters, three of them constituted by just a pair of book producers, the other three by a triple connection, two of which are linear chains (Figure 7). Starting from top left and moving clockwise, the six clusters are constituted by the book producers listed in Table 1 (the data of the different clusters are separated by an empty line).
All these connections occur in the context of a local market, which makes them highly probable. In reference to the second cluster of the list, situated in Leipzig between the end of the 15th century and the first half of the 16th century, the mutual influence has already been demonstrated in an in-depth study by Richard Kremer (Kremer 2022). Apparently a similar situation took place again in Leipzig at the beginning of the 17th century, as the first chain clearly shows. In the case of the triad, situated in Venice, the validity of the method is confirmed by the fact that the printer Giovanni Varisco (1558-1590) is always involved, either as a member of a publishing company or as a unique owner of an enterprise. The other three connections are new findings that can be further investigated. The next two clusters each involve five book producers (Figure 8). From left to right, the members of the two clusters are listed in Table 2. The first cluster again has a genuine local character, situated in London. A quick look at the editions produced that belong to the corpus immediately shows that they all indeed produced the same work: The Arte of Navigation. This work was originally authored by Martin Cortés (1510-1582) in Spanish and produced by António Alvares (fl. 1544-1556) for the first time in Seville in 1551 under the title Breve compendio de la sphera y de la arte de navegar (Cortés 1551), a historical source that has attracted much attention (Crowther 2020;Ulla Lorenzo 2022). A second Spanish edition then appeared in 1556 (Cortés 1556). It does not contain Sacrobosco's treatise but a compendium that is clearly strongly influenced by it. The overall work is conceived for the application of cosmological and astronomical knowledge to the activity of navigation and was therefore compiled in a local tongue (Leitão 2013). The compendium and one dedication letter by Cortés were translated into English by Richard Eden (1520-1576) and published by Richard Jugge (1514-1577) for the first time in 1561 (Cortés 1561). Jugge and his wife after him enriched and republished the work in 1572, 1579, and 1584 (Cortés 1572(Cortés , 1579(Cortés , 1584. A new reprint was then produced by Richard Watkins (fl. 1561-1598) and Abel Jeffes (fl. 1584-1599) in 1589 (Cortés 1589). It is from the next edition of the same work, published in 1596 and enriched with a series of computational tables useful for navigation, that the process of imitation took place. In this case, the apparent imitation is explained by means of the distinction between publisher and printer. The 1596 edition was published again by Watkins, but in cooperation with the publisher Hugh Astley (fl. 1588-ca. 1609) (Cortés and Tapp 1596). This edition shows a new text-part-the aforementioned collection of computational tables-which is authored by John Tapp (1575-1631), who belongs to the cluster under consideration here. Tapp, however, became a publisher himself and put the next three editions on the market in 1609, 1615, and 1630 (Cortés and Tapp 1609, 1615, 1630.  1625-1655). As a publisher, Tapp was most likely the owner of the format of the work and was therefore allowed to reprint it in the same way. From the Oxford DNB, we also know that William Stansby had been an apprentice of John Windet and later inherited his workshop. This component, in conclusion, not only validates our method but also shows that the awareness relationship can include economic relationships between book producers due to the emerging distinction between publisher and printer in the early modern book market. In the second cluster, the connections are less dense but no less significant. Isabelle Pantin has clearly elucidated the early economic relationship between Guillaume Cavellat (1500-1576), the first scientific publisher of the modern era, with Jérôme de Marnef (1515-1595). Moreover, Denise Cavellat (fl. 1567-1616), who led the workshop after Guillaume's death, was the sister of Jérôme. Their activities had evident common commercial interests, which explains why their editions look similar (Pantin 1998(Pantin , 2022Pantin and Renouard 1986). Less known was the relationship between them and Jacques Quesnel (1590-1661), also active in Paris. After investigations, it turned out that the heir of Denise Cavellat, her son André Sittard (fl. 1581-1605), inherited the workshop and the books in stock from his mother and sold portions of the stock to Quesnel. Quesnel then most likely reissued them by adding the necessary title-page and colophon sheets in order to display his ownership. A closer inspection of two 1619 editions (Sacrobosco 1619;Sacrobosco et al. 1619), finally confirmed the truth of this conclusion as they both display the printer devices of both Jacques Quesnel and Denise Cavellat (together with Jérôme de Marnef).
This cluster is particularly interesting because it is the first that shows a transregional connection-between Paris and Venice, specifically between Cavellat and Girolamo Scoto (1505-1572), the latter being an important Venetian publisher and printer and member of an entire family active in the same business. Scoto published two editions of the Sphaera, in 1562 and 1569 (Sacrobosco et al. 1562(Sacrobosco et al. , 1569. These not only display the same content of some of the Parisian editions realized by Cavellat, but Scoto even declares that they are imitations of those editions in the very title: "ex postrema impressione Lutetiae". In conclusion, the analysis of this cluster exemplarily shows the impressive predictive power of our approach and method.

A Venetian Cluster
The next cluster of the graph is constituted by 14 Venetian book producers (Figure 9). The members of the cluster, ordered according to the chronological sequence of their years of activity, are listed in Table 3. This component, strictly local, is particularly interesting because it shows that, based on the fingerprints, our method can display social relationships among book producers over long periods of time-in this case from the last decades of the 15th century until the second half of the 16th century. As the names of the book producers and their companies clearly show, such long intervals are possible because this business was first and foremost a family business in which each member often acted not only as member of the family company but also individually. Over generations, this characteristic of the business created the background against which long-term, stable partnerships could be established. The three important printer families of Scoto, Nicolini, and Sessa are active in this network, which is based on relationships ranging from business cooperation to pure imitation and the practice of Tauschhandel. As for validation, the CERL Thesaurus informs us that Francesco Bindoni (fl. 1523-1557) and Maffeo Pasini (fl. 1524-1551) were in a stable economic relationship between 1524 and 1542. The fact that the three Venetian families were working in the same trade zone-where "trade zone" means a geographic region in which agents conducted real business and not a vague and abstract encounter of interests-is well known (Nuovo 2013) and outstandingly investigated (Kikuchi 2017(Kikuchi , 2018a(Kikuchi , 2018bRideau-Kikuchi 2022). The fact, however, that smaller entrepreneurs-such as Guglielmo da Trino (fl. 1485-1499), Giacomo Penzio (1450-1527), the same Bindoni and Pasini, and the 1573-excommunicated Francesco Rampazetto (fl. 1540-1576)-were most likely enjoying some stability by being connected to them has never been so clearly shown as now. A closer inspection of Penzio's 1519 edition, for instance, shows that it displays Melchiorre I Sessa's (1505-1565) printer mark (the cat with the mouse in his mouth) (Sacrobosco et al. 1519). Again, this cluster is validated empirically and shows the power of our method in generating research questions.

The Jesuits' Cluster
The next cluster is relevant not only because it beautifully validates our method but also because it displays a possible peculiarity of the early modern academic book market, as it correlates with the edition history of the aforementioned Christophorus Clavius's commentary on Sacrobosco's Tractatus de sphaera (Figure 10).
The members of the clusters, ordered according to the first time they produced an edition of Clavius's commentary, are listed in Table 4. The validity of this cluster becomes immediately evident by looking at the works produced by these book producers. Namely, they all produced the famous commentary on Sacrobosco's Sphaera authored by the Jesuit Christophorus Clavius. The first edition appeared in Rome in 1570 from the press of Vittorio Eliano (1528-1581) (Grendler 2022;Sacrobosco and Clavius 1570). As is clear from its first page, Eliano provided himself with extremely rigorous privileges over the work for 10 years. Most likely for this reason, the first of the following 19 new editions appeared in 1581 (Sacrobosco and Clavius 1581); this is the edition produced by Francesco Zanetti (1530-1591) and Domenico Basa (1500-1596), both members of the clusters. Because this second edition was less protected (or because Clavius learned how to protect himself from the publishers and printers!), many others appeared. The book producers of this cluster are responsible for all new editions, except one published in 1611. The 1611 edition is the one printed in Mainz within the great editorial initiative whose aim was to produce Clavius's Opera omnia, an initiative whose goal clearly differed from the aim of publishing a single textbook, as the commentary on the Sphaera is printed as a third volume (Sacrobosco and Clavius 1611). The cluster shows the usual aspects of the family business-Basa, Zanetti, and Gabiano-but not only that. As Ian Maclean showed, some of the editions were also advertised at the Frankfurt Book Fair, and the commentary became the target of real piracy (Maclean 2022).
In general, as we are confident in our results concerning mutual awareness among these printers and publishers, it seems quite peculiar that 18 editions, produced by very different players on the market and in 5 different European cities between 1581 and 1618, retained not only the same content (apart from 3 minor updates) but also exactly the same layout. This opens the question as to whether Clavius or the Jesuit order exerted control over their works that went beyond the content or, more likely, whether keeping the layout was a way for the publishers and printers to make sure that the work was easily recognized as a product of the Jesuits. It does not seem like enough information is available to answer this question now, but it could be relevant to mention that the textbooks of Jesuit scholars were mostly conceived for a closed market that did not interact significantly with either the standard transregional market or the local markets. The network of Jesuit educational institutions was provided with exemplars of such works (Grendler 2022). This implies some form of central management and therefore further confirms the existence of an awareness cluster among these book producers. The last cluster discussed here displays the greatest spatial and temporal dimensions and is therefore a candidate for the most relevant in explaining one of the mechanisms of the process of the homogenization of knowledge ( Figure 11).
The members of the cluster, ordered according to their periods of activity, are listed in Table 5.
The list contains one anonymous book producer, for whose work the place of publication is also unknown. This book producer is labeled "Anonymous 1598" because it concerns one edition published in 1598 (Blebel 1598). However, as the edition is a work authored by Thomas Blebel (1539, who published another 10 editions, all of them in Wittenberg and with 4 different book producers, we assigned Wittenberg as the place of publication of the Anonymous 1598 edition. Given the high number of book producers involved-28-and the correspondingly high number of editions to be considered, it is difficult to achieve an in-depth empirical analysis. However, these data are first and foremost validated by a series of general analyses of the entire corpus conducted elsewhere and already mentioned, by means of which we were able to show the emergence of a dominant epistemic family of treatises that were imitated all over Europe in their content. Such an epistemic family was created in Wittenberg beginning in the 1530s (Valleriani et al. 2019). The cluster under examination here shows a subgroup of book producers who borrowed not only the content of the Wittenberg textbooks but also their general layout. Moreover, further studies have shown that Wittenberg continued for decades to produce new text-parts-mostly in anonymous form but whose author we were able to identify as Georg Rheticus )-which enriched their textbooks and those produced in many other different places thereafter. Even textparts or entire textbooks previously published elsewhere and then in Wittenberg around the middle of the 16th century immediately experienced international success. This continuous addition of novelties into Wittenberg textbooks-novelties we have defined as "enduring innovations"-proved influential for many years and in many places, and contributed in keeping continental attention on Wittenberg book production (Zamani et al. 2020). On the basis of the results mentioned in Section 1 we can state that the number of book producers who imitated the content of the Wittenberg textbooks is much higher than in the cluster shown here, but the fact that the latter perfectly overlaps with the first with no exception is a clear validation of our method. Figure 11. Cluster of early modern Wittenberg printers and publishers and of those in northern Europe who produced textbooks following the Wittenberg model. Empirical validation has shown that imposing a maximal degree of connection while creating a graph based on the chains of editions does not create redundancies. The translation of the nodes of the graph from editions-or, more precisely, "print-events"-into book producers, and a strict rule concerning the temporal intervals related to their lives, can disclose international clusters of social relationships among early modern printers and publishers by making use of the fingerprints together with general standard bibliographic data to understand the characteristics and limits of the structure over which scientific knowledge evolved.

Historical Interpretation of the Imitation Phenomenon and Its Temporality
Each of the clusters we were able to identify in the Sphaera corpus and that this method will allow us and others to identify in the future is a discovery and the background against which new research questions can be formulated. What kind of relationships were in place among individual book producers? Exactly which text-parts were they copying or exchanging? These are the kind of new questions that such research can suggest. A network of social awareness also emerged due to other causes that we do not (yet) know-be it the distribution routes of books, book fairs, or any other phenomena that might explain how a certain book producer got another book producer's book into her or his hands. However, even without entering the depth of the analysis, an overall interpretation referring to the Sphaera corpus as an example is already possible. Validated general graphs of awareness such as ours offer the possibility of quantifying one of the aspects that contributed to the process of the transmission, transformation, and homogenization of knowledge. This is related to the material, economic, institutional, and financial conditions that encouraged imitation in the early modern book market, a market that during its first century of life was highly unregulated and in which individual strategies could be fundamental to the survival of a business.
Taking our starting point-the Sphaera corpus, comprising 359 different editions covering 178 years from 1472 on-we can posit that our collection of chains involves 63% of the corpus. This value is calculated by disregarding the number of editions used to build the graph of social awareness, as this was built under the conditions that the book producers of connected editions had to be alive at the time of publication. The value is calculated without this condition (and considering therefore all the editions involved in the chains) and thus we climb to 228 editions (63%) involved in the imitation process. Should further studies confirm such a value, it means that pressing financial considerations of the sort considered in this work impacted the evolution of scientific knowledge on more than 50% of the products of such knowledge, and especially on those that had to accomplish the fundamental function of distributing it.
We can moreover analyze the geo-temporal distribution of the phenomenon. For this purpose, we have created a dynamic visualization of chains of similar editions obtained by means of the application of the first metric, which was then empirically validated (Section 3.1.1) (For a dynamic visualization see Video S1: https://www.mdpi.com/article /10.3390/histories2040033/s1).
It immediately appears evident that the phenomenon of imitation had an almost completely local character until 1539, and that this characteristic remained predominant for the entire period considered here and covered by the corpus. Only six chains show a real geographical transfer and transregional flow. This certainly indicates that behind the clusters discussed in the previous section are strong economic and production-related relationships that were eventually realized or realizable because of geographic proximity. Secondly, the first four transregional chains (Chains 8,13,23,and 30) developed in northern Europe, and only after 1564 do we have a chain (Chain 38) that connects Paris with Venice. It is only toward the end of the century that one chain (Chain 7) connects other regions of Italy to northern regions, but never at a higher latitude than Geneva. We have already demonstrated elsewhere that the diffusion and transformation processes of scientific knowledge do not appear to be dependent on or affected by the political and regional divisions that emerged in Europe because of the affirmation of the Protestant confessions (Valleriani et al. 2019;Zamani et al. 2020). These chains seem to support those conclusions, but they might also indicate the beginning of the emergence of a divide between northern and southern Europe.

Temporality of Imitation: Chains
Chain 13 and Chain 7 offer us the opportunity to compare the phenomenon of imitation in two different regions of Europe, though it is difficult in this case because of the small amount of data beyond these chains. Experimentally, however, we can investigate the inherent dynamic of knowledge circulation by isolating and comparing these two major chains (Chain 13: Wittenberg, Antwerp, Paris, Frankfurt am Main; Chain 7: Rome, Venice, Lyon, Saint Gervais, Geneva) by taking into consideration the three fundamental parameters of time, space, and speed (Figures 12 and 13).  As a chain is constituted of segments defined by the recurrences of editions, the three parameters are investigated on two levels: one concerning the entire chain (as, for instance, the total space traveled between the first and the last recurrence) and one concerning the segments (as, for instance, the single spaces covered between one and the next recurrence). The first is a meta-evaluation of the dynamics of a phenomenon that has taken place in the past (the historians' view), while the second gives a hint of how the contemporary historical actors might have perceived the same phenomenon.
The major difficulty in direct comparison is the fact that the two chains do not symmetrically overlap temporally. Chain 13 begins in 1535 and continues for 94 years. It is an almost secular phenomenon. Chain 7 begins in 1581 and continues for 37 years. Thus, Chain 7 overlaps with Chain 13, but not vice versa. Broken down by the publication intervals, namely the average time between the recurrences of each chain respectively (4.75 years for Chain 13 and 2.17 years for Chain 7), the resulting similar normalized values of 19 and 17 might indicate that, although Chain 13 appears to us as a long-durée phenomenon in comparison to Chain 7, both processes are characterized by a similar dynamic of recurrences.
We can also consider the total distance traveled, obviously disregarding the real means of transportation and communication at the time and considering only the air distance. The total distance traveled in both chains is indeed similar, but because of the total temporal lengths of the two chains (roughly a 1:3 ratio between them) the average speed (total space divided into total time) of Chain 7 is about three times higher than that of Chain 13 (Chain 13: 53 km/year, Chain 7: 150 km/year). However, the consideration of the total time (and average speed) of a historical process is the historian's privilege. To realize how this process was perceived, and therefore to be able to make an assumption on how such perceptions might have influenced the phenomenon itself (for instance through buying or not buying books) we can analyze the parameters of the segments of the chains. We therefore calculate the instantaneous speeds, namely the speed for each segment as determined by two consecutive recurrences. We call this value "interval speed" (Figure 14). In this way it becomes immediately evident that Chain 13 displays a phenomenon of circulation of knowledge almost only in its first phase, roughly during the decade between 1540 and 1550. After this initial phase, it was established in several local markets (no distance traveled: speed = 0), and continuous recurrences of the editions maintained a high level of homogeneity among the historical sources in their places of publication. Chain 7 displays a similar, but longer initial phase. From the historians' perspective, we can easily state that this process was less successful, because no tradition was established. The overall contribution of the editions of Chain 7 to the process of the homogenization of knowledge was therefore rather marginal. The dynamic of knowledge circulation, however, suggests that between 1590 and 1610 the editions of this chain must have been perceived as extremely successful and most likely contributed to the emergence of a slightly different scientific identity, limited to the regions of southern Europe, the Iberian peninsula excluded.
This analysis, though deeply experimental and limited by the small amount of available data, can nevertheless open the doors to a form of the quantitative analysis of the historical temporality of single processes from the distinguished perspectives of both historians and historical actors, the latter being a subject that can be used to causally reconstruct the historical process itself.

Temporality of Imitation: The Whole Corpus
Moving to the whole corpus, a genuine ex-post historian's perspective offers the opportunity to analyze the temporality of the whole process. The chains of editions and the social networks achieved in this work allow us in fact to model the temporality of the imitation phenomenon in general.
We first split the number of editions involved in the chains into decades according to their publication years. Then we compare this number with the total amount of editions produced in the same decade ( Figure 15). The result is quite astonishing, as it shows that the percentage of editions involved in the chains begins at 50% of the entire corpus and increases over time to stabilize during the third quarter of the 16th century. However, the highest value reached by the end of the considered time frame is not indicative, as the amount of available data is very low. At this point it is possible to compare the previous results with a similar analysis of the social clusters. As mentioned above, to translate the chains of similar editions into social clusters, we added the condition that each pair of editions had to be printed by different book producers who were alive at the publications of both textbooks. By using the general graph of the connected editions (Figure 4), we can analyze the phenomenon of imitation by excluding cases of reissues and reprints by the same printer, as well as cases of imitation of editions produced by people who were already dead. In other words, we focus on imitation between potential partners.
To obtain this result we use approaches typical of social network analysis (Scott 2000). We calculate the in-degree value of each edition, namely the number of connections that reach each of them in the network. Then we aggregate these in-degree values on the basis of the years of publication of all editions involved. In this way, for every year in which at least one edition has been published and that turns out to be influenced by at least one previous edition, we obtain the maximal and absolute number of in-degree relationshipsrelationships between editions that have influenced later editions. This means that the temporally last node of each network component corresponds to the edition with the highest in-degree value (the number of incoming connections). We interpret the in-degree value per year as indicative of the intensity of the phenomenon of imitation. Although in making this graph we have connected each edition of a component to all temporally subsequent editions of the same component (assumption of maximal connectivity), meaning the resulting aggregated in-degree value is not a strictly empirical value, it can nevertheless offer an indication of moments when the phenomenon intensified ( Figure 16). The highest number of connections, which can be observed for the year 1607 (80 connections), means that in that year some editions were published that concluded (long) chains initiated earlier. To use these data to model the temporality of the phenomenon, therefore, we calculate a moving average ( Figure 16, red line) that considers the last 10 time-steps, which, according to our data, roughly correspond each to a window of between 8 and 15 years. In this way we reach the conclusion that awareness relationships of the kind described in this work exerted an influence on the evolution of scientific knowledge beginning in the 1530s and reaching a first peak just after 1550. The phenomenon then intensified again toward the end of the century, although the dataset of the corpus decreases in size beginning in 1620. As already mentioned, we do not consider the last 10 years as historically reliable enough to model any trend.
On an abstract level, we can formulate the hypothesis that the process of the homogenization of knowledge was also due to a process of imitation among book producers that followed a wavelike temporal trend. Further studies will hopefully throw more light on this phenomenon.

Discussion
The method presented in this work is empirically validated through the data extracted from the Sphaera corpus and, especially, through the data collected by means of the dissection of the sources into text-parts. We suggest applying this method to the investigation of a great number of historical sources when standard bibliographic fingerprints are available, as they are available, for instance, in the collection EDIT16 mentioned above or through the catalogs of archives and libraries in general. The method is particularly efficient for investigating works that, because of their functions, tend to repeat content (such as textbooks or religious and liturgical works). In general, however, the method leads to results in every instance in which new editions of the same work have been produced.
The advantages are clear. Without further investigations, the application of this method to a large repository automatically leads to historically meaningful clusters of book producers. As the definition of such clusters is not only interesting for book historians and bibliographers but for anyone interested in the development of knowledge in its real, material contexts-institutional, political, and economic-this method can become a fundamental instrument for navigating the big data gathered through 20 years of the digitization of historical sources in many regions of the world (Siebold and Valleriani 2022). As shown, the method also possesses a heuristic component; it allowed us, for instance, to discern the differences (and the consequences of the distinction) between the publisher and printer, a distinction that has its roots in the European early modern period; and it prompted specific behavior in the data correlated to works that emerged from the context of a specific religious order.
Further validation from new corpora will certainly fine-tune the procedure and the results, but the next step toward transforming the procedure into a tool that can be used out of the box concerns the 26% of editions involved in the chains of works that have similar fingerprints but that, on the basis of empirical, manual validation, had to be discarded. It is true that when moving to big numbers, real trends can be identified even if a significant amount of mistakes are spread through the dataset-but there is, in our opinion, room for improvement.
The first and most obvious step in this direction is to add a condition related to book format to assess similarity among fingerprints. As similarity among fingerprints means similarity between editions (both concerning their content and their layout) the imitation could only be realized by using the same format as that of the imitated edition. By making use of this additional metadata, we could immediately improve the results by around 4%, meaning that the total number of editions involved in the chains drops from 301 to 283, while, as discussed above, the empirical validation decreases the number of editions to 228. The code that contains this further condition concerned with book format is the main result offered by the present work from a methodological perspective.
Apart from the possibility of an approach that applies knowledge graphs in order to extract entity information and to provide semantically relevant embeddings based on graph walks and text processing (El-Hajj and Valleriani 2021), we believe that an application of machine learning technologies in computer vision, which enables an automatic layout comparison, would ultimately increase the accuracy of the current approach to a value close to that achieved by close reading. Such automatic layout analysis relies on training large neural networks that are able to identify the numerous elements of the scanned page based on large training data, such as the one presented by (Papadopoulos et al. 2013). These elements are usually represented by paragraphs, headers, images, and tables, and they in turn would allow us to automatically generate and compare layout maps on a corpus scale (Lombardi and Marinai 2020) and consequently to investigate the patterns of transformation of layouts as well-a potentially relevant aspect in understanding the phenomenon of imitation (Vierthaler 2016).
A precondition for the realization of this step is the existence of electronic copies of the works-a condition that, in many cases, is already fulfilled. However, if we are dealing with hundreds of thousands, if not millions of pages to analyze, it also becomes clear that pairwise comparisons between each page of each preserved exemplar of each edition are not realistic because they would supersede the calculation performance at the disposal of even the best research institutions.
The application of our procedure has two advantages in this respect: First of all, it reduces the number of works that need to be compared, namely to only those that happen to belong to a chain of editions. Secondly, as the fingerprints are indicative of the positions of the pages in exemplars, which suggests similarities of content and layout, this feature allows researchers to limit the number of pages considered for each selected exemplar.
In this regard, we can benefit from the recent development of deep learning, which provides a wide array of neural network architectures (such as VGG16 and ResNet (He et al. 2016;Simonyan and Zisserman 2014)) and can be used to generate representative feature vectors of every page in question. These vectors, consisting of a mathematical vector of decimal numbers, can be easily compared by relying on vector algebra to generate similarities between the chosen pages they represent. Such an approach can give us a good general estimate of the similarity between pages, but often misses minute details due to information loss as a result of the dimensionality reduction that takes place when converting the page image-a relatively large 2D array-into a smaller feature vector. This is particularly important when it comes to analyzing a dataset such as the Sphaera corpus, where the pages originate from books that discuss the same topic and use similar graphical representations, meaning two similar pages may contain slightly different content.
To remedy this shortcoming, we aim to separately analyze the similarity of further components of the Sphaera editions, namely tables and images, before generating a representative image based on their combined output. In this respect we have developed an accurate neural network image extractor, tailored to the needs of early modern manuscript data , as well as models to extract and interpret numerical tables, which are abundant in astronomical manuscripts (Eberle et al. 2020;El-Hajj et al. 2022).
By obtaining accurate localization of each of the page's components, as well as accurate individual similarity measures, we foresee that with the use of deep learning we will be able to generate fingerprints for large manuscript corpora automatically, which, combined with the fingerprint analysis methodology presented above, will allow us to better analyze and understand larger book trade networks and their effects on the evolution of knowledge in Europe and beyond.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/histories2040033/s1, Video S1: Dynamic visualization of the geo-temporal behavior of the 56 empirically validated chains of similar editions. We assumed that the links connecting the editions remained historically meaningful up to a maximal age of 90 years. Data Availability Statement: The repository of the historical sources constituting the Sphaera corpus and introduced in Section 1 is available through the website of the research project The Sphere: Knowledge System Evolution and the Shared Scientific Identity of Europe (https://sphaera.mpiwg-berlin.mpg.de, accessed on 1 November 2022). The lists of text-parts introduced in Section 2.1 are also publicly available through the same repository. The code of the routine for the two versions of the similarity metric between fingerprints is available as a Jupyter notebook at https://gitlab.gwdg.de/MPIWG/Departm ent-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Notebooks/Fingerprint_Metrics.ipynb, accessed on 1 November 2022. The entire dataset of the corpus divided into three lists, concerning respectively the editions, their text-parts, and the people involved, is available in the folder https://gitlab.g wdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Input_Data, accessed on 1 November 2022. Data for all the chains of similar editions as obtained after running the code of the first metric and involving 301 editions is available at https://gitlab.gwdg.de/MPIWG/D epartment-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Chains_Data/Fingerprint_Chains_si mple_condition.txt, accessed on 1 November 2022. The same list of 56 chains involving 301 editions, enriched with metadata, displaying the results of the empirical validation, in the column "Empirical Analysis" of the spreadsheet, is available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaer a/sphaera-fingerprint-paper/-/tree/main/Chains_Data/Fingerprint_Chains_simple_condition_M etadata.csv, accessed on 1 November 2022. The cleaned and empirically validated list of the chains of editions mutually similar in form and content and obtained by applying the first metric is available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper /-/tree/main/Chains_Data/All_Chains_Validated_simple_condition.csv, accessed on 1 November 2022. For the chain information, see the column "Real Chain Name" in the spreadsheet. The list of chains and editions involved by applying the second, less relaxed metric is available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/tree/ma in/Chains_Data/Fingerprints_Chains_additional_condition_Metadata.csv, accessed on 1 November 2022. The list is already enriched by metadata. The list obtained by running the second metric and that contains the comments resulting from the empirical analysis (column: "Empirical Analysis") is available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint -paper/-/tree/main/Chains_Data/All_Chains_Validated_additional_condition.csv, accessed on 1 November 2022. The code for the creation of the graph of Figure 4 is available as a Jupyter Notebook at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/tree/ma in/Notebooks/Awareness_relations.ipynb, accessed on 1 November 2022. Please also consider the technical requirements at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerp rint-paper/-/tree/main/Notebooks/Requirements.txt, accessed on 1 November 2022. The data to generate the network displayed in Figure 4 is available at https://gitlab.gwdg.de/MPIWG/Departm ent-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Network_Data/Editions_Network.csv, ac-cessed on 1 November 2022. The network data also contain the information concerned with the rates of the different sorts of edge parameters as shown in Figure 5. All network visualizations have been created by means of the open-source software Cytoscape. The Cytoscape file for the complete social network as well as for the single components are available in the folder https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/tre e/main/Cytoscape_Networks, accessed on 1 November 2022. This work makes use of previous research results achieved in the frame of the project The Sphere: Knowledge System Evolution and the Shared Scientific Identity of Europe. For the entire list of publications and research results, see https://sphaera.mpiwg-berlin.mpg.de/publications, accessed on 1 November 2022. The data displayed in Table 1 is available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaerafingerprint-paper/-/tree/main/Table_Data/FPS_Table_01.csv, accessed on 1 November 2022. The data of the different clusters are separated by an empty line in this and the following lists. The data displayed in Table 2 are available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/spha era-fingerprint-paper/-/tree/main/Table_Data/FPS_Table_02.csv, accessed on 1 November 2022. The data displayed in Table 3 are available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaer a/sphaera-fingerprint-paper/-/tree/main/Table_Data/FPS_Table_03.csv, accessed on 1 November 2022. The data displayed in Table 4 are available at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Table_Data/FPS_Table_04.csv, accessed on 1 November 2022. The data displayed in Table 5 are available at https://gitlab.gwdg.de/MPIWG/D epartment-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Table_Data/FPS_Table_05.csv, accessed on 1 November 2022. The most refined code to generate chains and that contain the condition concerned with the book format as discussed in the Section 5 is available at https://gitlab.gwdg.de/M PIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/tree/main/Notebooks/Fingerprint_M etric_plus_Format.ipynb, accessed on 1 November 2022. Please also consider the technical requirements at https://gitlab.gwdg.de/MPIWG/Department-I/sphaera/sphaera-fingerprint-paper/-/t ree/main/Notebooks/Requirements.txt, accessed on 1 November 2022.