Establishing Genealogies of Born Digital Content: The Suitability of Revision Identifier (RSID) Numbers in MS Word for Forensic Enquiry

.


Introduction
The personal computing and desktop publishing revolutions of the 1980s and 1990s, coupled with the rise to prominence of the WWW and associated e-mail communications in the late 1990s and early 2000s, have seen a fundamental shift in the generation of documents.The generation of textual material originates overwhelmingly in digital form, from outlining to writing and editing the final version of the text.This "born digital content" is poised to become the sole evidence for an author's writing process for all written materials, be they literary works, professional reports, academic journal articles, or formal corporate correspondence.Akin to heavily edited typewritten manuscripts replete with strike-outs and manual marginal annotations, digital manuscripts can retain evidence of the editing processes of deletion, rephrasing, and addition.Unlike typewritten manuscripts, however, the evidence for these editing processes is not visible to the casual, everyday user.
Two aspects of digital manuscript files created by word processing programs are of interest from a forensic science and digital archeology perspective: their intentional manipulation for the purposes of hiding information (steganography) [1][2][3][4][5][6][7] and their use to identify content generation and editing processes that may shed light on the origin of the information contained in the files.The latter is of importance for understanding the nature, sequence, and extent of textual modifications in, say, policy documents.Importantly, in cases where multiple, iteratively generated versions exist, it might be useful to be able to reconstruct the overall editing sequence and thereby arrive at a 'genealogy' of the various documents.
Microsoft Word (henceforth MS Word), supplied as part of the Microsoft Office package of desktop applications and available for the main operating systems of Microsoft Windows and Apple Mac OS, has become one of the most ubiquitously used word processing programs and has attained a monopoly in the corporate, educational, and consumer space [8].It is therefore useful to examine the suitability of MS Word files for digital forensics inquiries.
Using a large data set of MS Word document template files created by a single publisher, this paper provides a proof-of-concept demonstration to show that document genealogies can indeed be established using "revision save identifiers" (RISD) embedded in MS Word files-and that such genealogies are independent of metadata, such as time stamps, that can be easily manipulated.

Office Open XML Standard and the Implementation of Revision Save Identifier Numbers
From 2007 onwards, Microsoft implemented the Office Open XML standard [9], where all document components (text, styles, settings, metadata, etc.) are stored in a compressed archive, which is visible to the general user as a .docxfile.In compliance with the standard, the application MS Word assigns a revision save identifier (rsid) number to each document (stored in the <rsidRoot> tag in the settings.xmlfile) and also to every document editing session that begins and ends with a save (or e-mail send) action.These RSID numbers are (i) embedded in a tag that encloses the respective text inserted during that editing session and (ii) stored in the settings.xmlfile as a sequence of eight-digit hexadecimal numbers.In the document XML file, each RSID is associated with a specific action such as a text insertion or deletion (if the track changes function was activated) as well as specific formatting such as bolding, the insertion of page and section breaks, and subsections of tables.
The Office Open XML standard requires that RSID numbers be "randomly generated based on the current time" and that "every editing session shall be assigned a revision save ID that is larger than all earlier ones in the same file" [9].As noted elsewhere, MS Word does not faithfully implement the Office Open XML standard, and successive edits are not assigned RSID numbers of increasing value [10].The Office Open XML standard also stipulates that "[a]n identical <rsid> value between two documents with the same <rsidRoot> shall indicate the same editing sessions" [9].Thus, if a digital copy of a document is made, either by duplicating via the operating system (file copy) or via a "save as" action in MS Word, then both files will carry the same rootRSID and all other RSID listed in the respective settings XML files.The listed RSID numbers will start to diverge with the addition of new RSIDs, which reflect subsequent edit and save events that are unique to the duplicated files (Figure 1) [9].Testing has shown that MS Word, both in its standalone desktop and distributed versions (Microsoft 365), does implement this part of the standard [10].RootR-SID numbers, as well as RSIDs assigned to an editing session, stay with a given document, even if this document is being shuttled around between multiple authors (with additional RSIDs allocated as editing progresses, irrespective of the computer used).

Using Revision Save Identifier Numbers to Develop Genealogies of Born-Digital Content
The adherence of MS Word to that part of the Office Open XML standard allows us to develop genealogies of document generation.The process of developing these genealogies is best explained through a hypothetical example, which shall be comprised of seven files (F1-F7), each of which contains eleven RSIDs.For the sake of simplicity, these were given alphanumeric identifiers in Figure 2.Each of these seven files contains RSIDs that

Using Revision Save Identifier Numbers to Develop Genealogies of Born-Digital Content
The adherence of MS Word to that part of the Office Open XML standard allows us to develop genealogies of document generation.The process of developing these genealogies is best explained through a hypothetical example, which shall be comprised of seven files (F1-F7), each of which contains eleven RSIDs.For the sake of simplicity, these were given alphanumeric identifiers in Figure 2.Each of these seven files contains RSIDs that are unique to that file and some that are shared between one or more other files.The removal of all RSIDs that are not shared between two or more of these files, followed by a seriation, results in the RSID pattern as shown in Figure 3.
moval of all RSIDs that are not shared between two or more of these files, followed by a seriation, results in the RSID pattern as shown in Figure 3.
In this example, three RSIDs (D, F, and I) are shared by all seven files, with two RSIDs shared by six files, three RSIDs shared by five files, one RSID shared by four files, and two RSIDs shared by two files each (Figure 3).Given that file F1 shares only three RSIDs with the other six files, and those files show other commonalities, F1 is the parent of the other six.Likewise, as F2 shares only five RSIDs with the other children of F1, files that also show other commonalities, F2 must be the parent of the remaining files.This logic allows us to develop a genealogy of files that can be represented in a phylogram (Figure 4).In the case of files F4 to F7, all have one additional RSID in common (RSID H), but differ, with files F4 and F5 sharing one RSID (T) that files F6 and F7 do not have, whereas files F6 and F7 possess a RSID (J) that is not shared with files F4 and F5.Thus, these four files all have a common ancestor, then split and split again.Genealogical priority cannot be resolved at the lowest level, however.While both F4 and F5 have a common direct ancestor, it is not clear whether F5 was split off from F4 or vice versa.A relative level of time depth can be established based on the number of shared RSIDs (representing the number of edits) between splits in the file sequence.

Aim of the Paper
Drawing on a large data set of MS Word document template files created for more In this example, three RSIDs (D, F, and I) are shared by all seven files, with two RSIDs shared by six files, three RSIDs shared by five files, one RSID shared by four files, and two RSIDs shared by two files each (Figure 3).Given that file F1 shares only three RSIDs with the other six files, and those files show other commonalities, F1 is the parent of the other six.Likewise, as F2 shares only five RSIDs with the other children of F1, files that also show other commonalities, F2 must be the parent of the remaining files.This logic allows us to develop a genealogy of files that can be represented in a phylogram (Figure 4).In the case of files F4 to F7, all have one additional RSID in common (RSID H), but differ, with files F4 and F5 sharing one RSID (T) that files F6 and F7 do not have, whereas files F6 and F7 possess a RSID (J) that is not shared with files F4 and F5.Thus, these four files all have a common ancestor, then split and split again.Genealogical priority cannot be resolved at the lowest level, however.While both F4 and F5 have a common direct ancestor, it is not clear whether F5 was split off from F4 or vice versa.A relative level of time depth can be established based on the number of shared RSIDs (representing the number of edits) between splits in the file sequence.
RSID pattern after removal of unique RSID and seriation.

Aim of the Paper
Drawing on a large data set of MS Word document template files created for more than 400 journals published by a single publisher, this paper provides a proof-of-concept demonstration to show that document genealogies can in fact be established.At the conceptualization of the research, it had been posited that the overall ease of workflow and the time savings derived therefrom predicated that the overwhelming majority of the publisher's journal template files would have been created by cloning and modifying one or more existing templates.The technique discussed in this paper will gain in importance as the need for the conservation, curation, and interpretation of born digital' content increases [11][12][13][14].

Data Acquistion
By October 2022, the publisher Multidisciplinary Digital Publishing Institute (MDPI) (www.mdpi.com,accessed on 11 October 2022) had published 402 journal titles (see Appendix A for details).The publishing model requires authors to submit the manuscript in a close-to-publication-ready format, which entails the use of a journal-specific document template (Figure 5), which can be downloaded from the respective "Instructions for Authors" page.The document template file for each journal was individually downloaded.In addition, the Internet Wayback Machine (www.archive.org/web/,accessed on 10 February 2023) was used to acquire numerous archived copies of template files of several journals dating back to 2010, which covers the majority of journal titles (Figure 6).

Aim of the Paper
Drawing on a large data set of MS Word document template files created for more than 400 journals published by a single publisher, this paper provides a proof-of-concept demonstration to show that document genealogies can in fact be established.At the conceptualization of the research, it had been posited that the overall ease of workflow and the time savings derived therefrom predicated that the overwhelming majority of the publisher's journal template files would have been created by cloning and modifying one or more existing templates.The technique discussed in this paper will gain in importance as the need for the conservation, curation, and interpretation of 'born digital' content increases [11][12][13][14].

Data Acquistion
By October 2022, the publisher Multidisciplinary Digital Publishing Institute (MDPI) (www.mdpi.com,accessed on 11 October 2022) had published 402 journal titles (see Appendix A for details).The publishing model requires authors to submit the manuscript in a close-to-publication-ready format, which entails the use of a journal-specific document template (Figure 5), which can be downloaded from the respective "Instructions for Authors" page.The document template file for each journal was individually downloaded.In addition, the Internet Wayback Machine (www.archive.org/web/,accessed on 10 February 2023) was used to acquire numerous archived copies of template files of several journals dating back to 2010, which covers the majority of journal titles (Figure 6).

Data Processing
All MS Word files, including MS Word templates (.dot), are compressed XML archives, which can be readily extracted by changing the file type from .dot to .zip.
Using a software tool written in Java, the RSID data (rootRSID and general RSIDs) were extracted from the settings.xmlfile (in the "word" subfolder) of each journal template file and written to a CSV data file.The same process was executed to extract the document creation and modification dates (file core.xml in the "docProps" subfolder).The RSID CSV file was sorted in MS Excel, with the RSID data for each template with the same root RSID subsequently extracted into separate work sheets for seriation.The seriated data were graphed to create a phylogram for several of the rootRSID.The introd u ction shou ld briefly place th e study in a broad context and highlight w hy 27 it is im p ortan t.It sh ou ld d efin e th e p u rp ose of th e w ork an d its sign ifican ce.T h e cu rren t 28 state of th e research field sh ou ld be carefu lly review ed and key publications cited .P lease 29 highlight controversial and d iverging hypotheses w hen necessary.Finally, briefly m en-30 tion the m ain aim of the w ork and highlight the principal conclusions.A s far as p ossible, 31 please keep the introd u ction com p rehensible to scientists ou tside your particular field of 32 research.R eferences shou ld be nu m bered in ord er of ap p earance and indicated by a nu-33 m eral or num erals in square brackets-e.g., [1] or [2,3], or [4][5][6].See the end of the d ocu -34 m en t for fu rth er d etails on references.

36
T h e M aterials an d M eth od s shou ld be d escribed w ith su fficien t d etails to allow oth -37 ers to rep licate and bu ild on th e published resu lts.Please note th at th e publication of your 38 m anuscript im p licates th at you m ust m ake all m aterials, data, com puter cod e, and proto-39 cols associated w ith th e publication available to read ers.Please disclose at th e su bm ission 40 stage any restrictions on the availability of m aterials or inform ation.N ew m ethod s and 41 protocols sh ou ld be d escribed in d etail w h ile w ell-establish ed m ethods can be briefly d e-42 scribed and appropriately cited .

Data Processing
All MS Word files, including MS Word templates (.dot), are compressed XML archives, which can be readily extracted by changing the file type from.dotto.zip.
Using a software tool written in Java, the RSID data (rootRSID and general RSIDs) were extracted from the settings.xmlfile (in the "word" subfolder) of each journal template file and written to a CSV data file.The same process was executed to extract the document creation and modification dates (file core.xml in the "docProps" subfolder).The RSID CSV file was sorted in MS Excel, with the RSID data for each template with the same root RSID subsequently extracted into separate work sheets for seriation.The seriated data were graphed to create a phylogram for several of the rootRSID.

Results
Given the effort involved in setting up a template file for a formal, print-ready manuscript, it was expected that the overwhelming majority of the journal template files would have been created by cloning and modifying a small number of existing templates.The assessment of the rootRSID of the template files for all 402 journal titles listed in the Appendix, however, found that this was not the case: 212 (51.7%) of all titles carry unique rootRSIDs.Two rootRSIDs account for 29.1% of all template files, representing 62 titles (00D77B5F) and 52 titles (00996419) each.Both were mainly present in journals that began publication in 2020 and 2021.A small number of journals with older commencement dates (2012-2015 period) also have one of these two root RSIDs, which suggests that the templates for these journals were reworked at that time.A further three RSIDs are shared by more than 10 titles each, with another nine RSIDs being shared by two or more titles.Example proof-of-concept phylograms were established for six rootRSIDs (phylograms A-F, Figures 7-12), which account for 42.3% of all journals.

Results
Given the effort involved in setting up a template file for a formal, print-ready manuscript, it was expected that the overwhelming majority of the journal template files would have been created by cloning and modifying a small number of existing templates.The assessment of the rootRSID of the template files for all 402 journal titles listed in the Appendix A, however, found that this was not the case: 212 (51.7%) of all titles carry unique rootRSIDs.Two rootRSIDs account for 29.1% of all template files, representing 62 titles (00D77B5F) and 52 titles (00996419) each.Both were mainly present in journals that began publication in 2020 and 2021.A small number of journals with older commencement dates (2012-2015 period) also have one of these two root RSIDs, which suggests that the templates for these journals were reworked at that time.A further three RSIDs are shared by more than 10 titles each, with another nine RSIDs being shared by two or more titles.Example proof-ofconcept phylograms were established for six rootRSIDs (phylograms A-F, Figures 7-12), which account for 42.3% of all journals.
Phylograms A and B reflect examples of successive creations and subsequent modifications of template files.The parent file of Phylogram A, based on root RSID 00A43C68, was created for the International Journal of Plant Biology, which in its creation and initial setup underwent 228 discrete edit and save actions.At that point, the file was duplicated for the journal Alloys (Figure 7).While the original file underwent another 74 edits until finalized, the template file for Alloys was edited six times before another duplicate for the file Dietetics was created.While Alloys was edited another 14 times until finalization, the template file for Dietetics was edited eight times before it was duplicated for the journals Waste, Meteorology, and Organoids (Figure 7).The template for Waste was edited three times before it spawned the template for the journal Entomology, while the template for Organoids became the parent for successive sequences of editing and spawning templates for other journals (Figure 7).Phylogram B, based on root RSID 007B3BA3, shows a similar linear sequence of generational editing and spawning from a single parent after an initial 175 discrete edit and save actions (Figure 8).

Kidney and Dialysis
Phylogram C, based on root RSID 00A556C8, shows that the template for the journal Batteries spawned the template for the Journal of Cybersecurity and Privacy after 114 discrete edit and save actions.Both journal templates then went on to spawn subsequent linear sequences of generational editing and spawning (Figure 9).Phylograms E and F are, in essence, merely more complex variations of the previous phylograms.Phylogram E, based on root RSID 00D77B5F and representing 62 titles, commenced as the template file for the Journal of Marine Science and Engineering, which, after 71 discrete edit and save actions, spawned Oceans, which, after five edit and save actions, spawned Psychiatry International (Figure 11).That journal template spawned two journals, one of which, Applied Mechanics, spawned another two journals, one of which was Reproductive Medicine.So far, the pattern follows that shown in Phylograms A-C.The template for Reproductive Medicine spawned the Journal of Zoological and Botanical Gardens and the journal Obesities.Each of these three journals was modified for four to six edit and save actions and then acted as the parent for numerous other journal templates, often created via file copies.Obesities acted as the parent for successive templates.All of these underwent considerable individual edits.Of these templates, the one for Earth is of interest because, after 68 discrete edit and save actions, this template spawned numerous other templates.Of significance is that, in that section of the genealogy of Phylogram E, the majority of templates are not spawned by large-scale copy file actions but by spawning following short runs of four to six edit and save actions.
Theoretically, the time stamp data for document creation (dcterms:created) and modification dates (dcterms:modified) contained in the file core.xml(in the "docProps" subfolder) provide the time bracket between the initial creation of the file and the last save event.The document creation dates would therefore allow for an absolute measure of time for the relative file generation sequence established based on the seriation of the RSID numbers.The extracted document creation dates, however, show that the vast majority of template files (90.4%) were created within less than ten minutes of each other (Table 1, 2021-12-27, 04:45 to 04:54 GMT) at an average of 41 files/minute.This can only have been achieved when a series of completed template files, created as .docxfiles, were opened and saved as.dot templates.This save action will preserve all RSID codes and allocations embedded in the document but will generate a file with a new creation date.This is reinforced by the observation that 94.3% of these files were never modified.Of the remaining 9.6% with creation dates other than 2021-12-27, 82.1% were never modified, suggesting that the low percentage of files that were modified were those where some elements were overlooked in the final QA process.A small number of templates were finalized on 2022-10-09 between 04:38 and 04:41 (Table 1).Several of these belong to the end section of Phylogram E (Figure 13).While the relative chronology shows that the templates for the last five journals in the genealogy were successively cloned for the template for the journal Anatomia, adding the creation dates shows that Anatomia and the preceding journals were completed on 27 December 2021, while the templates for the new journals were added on 10 September 2022.At the same time, the template for the journal Receptors was also revisited and a new template saved.At the end of 2022, the template for the journal Analytics was revisited, and a new template was also saved.

Historic Trajectories of Selected Journals
In view of the findings that the majority of journals launched before 2021 tend to exhibit unique rootRSIDs but that a large number of journals launched in 2021 and 2022 used templates that had been created from a small number of rootRSIDs, it was decided to trace back the template files for a select number of journal titles.In order to be able to consider a longer historic trajectory, titles were chosen that had been founded in 2009 and 2010.The choice of journals was restricted by the availability of older versions of template files that had archived by the Wayback Machine.Overall, while most of the journals' main landing pages as well as the "Instructions for Authors" pages were regularly archived, this did not extend to the linked download files.As a result, the availability of past templates was patchy (Table 2).The early template files of the 2010 to 2015 period were based on the .docformat.This makes sense from a publisher's perspective, as the public adoption of the .docxformat of files based on the Office Open XML standard, as introduced by Microsoft in 2007, was slow, and the publisher had to ensure compatibility.From late 2015 on, templates used the XML-based file format.Intriguingly, this format seems to have been abandoned during the second quarter of 2018 with the reintroduction of the .docformat.While there are no archived template files for any of the eight chosen journals for the years 2019 and 2020, template files created for other journals (as retrieved via the Wayback Machine) show that the .docformat was used both in 2019 (Heritage) and 2020 (Nanomaterials).The rationale for the temporary switchback to .docformat templates remains unclear.Template files changed back to the XML based format during the second quarter of 2021 (Table 2, date supported by other journal titles).
An examination of the documents' XML files for the eight journals (Table 3) showed that the template files before and after the 2018-2021 return to the .docfile format differ significantly.The files from 2021 onwards all have in common that the rootRSID for each title remains the same for the 2021, 2022, and 2023 templates and that the total number of RSIDs listed in the templates increases each year.Clearly, the 2022 XML-based template for a journal was developed by cloning and editing the 2021 journal template, while the 2023 XML based template was developed by cloning and editing the 2022 template.This conforms with the expectations for document genealogies as discussed above.

Discussion
The foregoing section has shown that the combination of rootRSIDs and general RSIDs contained in MS Word XML files can be used for seriation and that phylograms can be derived that represent document genealogies and relative chronological sequences.The phylograms clearly demonstrated whether templates were generated by copying an existing template file several times (e.g., Phylogram D, Figure 10), or whether creation sequences were largely iterative, with new templates spawned following minor edits (e.g., Phylogram F, Figure 12).
Theoretically, the extraction of file creation dates would allow us to place these relative chronological sequences on an absolute time scale, with the caveat (from a digital forensics perspective) that file creation dates can be manipulated.A relative chronological sequence could not be demonstrated for the present case study as 90.4% of all template files were created within less than ten minutes of each other.While the bulk of the data was not suitable for establishing absolute chronologies, it provided an insight into the production process.The data suggests that the individual template files had been cloned, modified, and completed after passing quality assurance.These finalized files were then saved from .docx to the .dotformat for use as templates, with the majority of these not undergoing a single further edit.Under "normal" document generation circumstances, such actions would be extremely rare, and absolute chronologies could be developed.The end section of Phylogram E (Figure 13) can serve to support the proof of concept.
The file creation date stamps have limitations, however, as the dcterms:created and dcterms:modified tags of MS Word record the computer time only in full minutes.Thus, they do not allow us to distinguish between a file copy using the operating system (and subsequent renaming), which would have a time stamp that is identical to the original to the second, and a copy made via a "save as" action, which would be likely to deviate by one or more seconds, assuming that the "save as" action occurs immediately after the final save of the parent file.If, however, sufficient time (for the computer clock to move to the next full minute) is allowed to elapse while the parent file is still open, or if that amount of time elapses between the last save (and close) of the parent file and the 'save as' action of the reopened parent file, then the time stamps of the parent and the child files will diverge (verified by testing).The modification date stamp would indicate the time spent modifying the relevant child document.
Since the file data are coded to Z(ulu) time (i.e., GMT), global time zone differences can be ignored, a matter that would come into play when considering documents derived from international collaboration or contract cheating drawing on international tutors.
While the MS Word XML also allows for the document's creator to be identified via the <dc:creator> tag in the core.xmlfile, all creator IDs were, as expected, set to the corporate MDPI term.Consequently, individual editors cannot be identified from the metadata.Even though it is not possible to assess how many people worked on the creation of the templates represented by Phylograms A-C and attribute the changes to specific authors, the unique pattern of Phylogram D suggests, albeit not conclusively, that the same author duplicated the template for the journal Cardiogenetics before editing them one-by-one.

Conclusions
Since 2007, Microsoft has implemented the Office Open XML standard in its suite of desktop applications.Given the near monopoly held by MS Word in the corporate, educational, and consumer spaces, this provides opportunities for digital forensics.From a document forensics perspective, the allocation of revision save identifiers (RSIDs) to reflect a range of textual and formatting edits allows us to understand much of the generation, writing, and editing history of a MS Word document.
As this paper has demonstrated, the allocation of a unique document RSID (rootRSID) to each MS Word file, which persists even when that file is duplicated via "save as" events or via direct file copy in the operating system, forms the basis for document genealogies.Starting with common "ancestor" document content, and thereby the RSID associated with the new content, they will begin to diverge.Subsequent copying will spawn new generations with shared but diverging combinations of RSIDs.A seriation of the shared RSIDs contained in the XML metadata of MS Word files can be used to develop phylograms, which represent the relative genealogical relationship between documents created from the same ancestor file.Differences in document creation dates then allow this relative relationship to be translated into absolute chronological dimensions.
An examination of over 400 template files produced by the publisher MDPI, carried out as a proof-of-concept exercise, validated the theoretical model.Moreover, several of the phylograms, combined with document creation timestamp information, also provided insights into the publishers' template production processes, underscoring the potential of RSIDs for digital forensic inquiry.
The findings of this proof-of-concept exercise have wide-ranging implications.The unique allocation of document RSIDs, coupled with the sequential addition of RSIDs associated with new content, allows for the generation of parent-child relationships between documents.Importantly, these genealogies are methodologically independent of other metadata, in particular time stamps, that are easily manipulated.Where multiple versions exist that exhibit conflicting text, the establishment of document genealogies allows an investigator to understand from which document the deviating file was forked.
Positioning genealogically one file between two others, where the creation dates of all files are known, may confirm or contradict the metadata embedded in the file, thereby potentially highlighting willful manipulation of dates.
Finally, from a future historian's point of view, the ability to establish genealogies of iteratively generated "draft" documents will give insight into policy development and formulation processes, akin to interpreting the strike-outs and file annotations encountered in traditional paper-based archival documents.

1 .
Figure 1.Implementation of RSIDRoot and RSID values.(A) original file; (B,C) file copy with subsequent unique edits.

Figure 1 .
Figure 1.Implementation of RSIDRoot and RSID values.(A) original file; (B,C) file copy with subsequent unique edits.

Figure 4 .
Figure 4. Phylogram of a file genealogy based on the RSIDs shown in Figure 3.

Figure 4 .
Figure 4. Phylogram of a file genealogy based on the RSIDs shown in Figure 3.

Figure 4 .
Figure 4. Phylogram of a file genealogy based on the RSIDs shown in Figure 3.

Publications 2023 , 24 Figure 5 .
Figure 5. First page of the MDPI journal template as it was current in 2022, containing sample text, images, and embedded stylesheets.

43
Citation: L astnam e, F.; L astnam e, F .; L astnam e, F. Title.Publications 2022, 10, x. http s://d oi.org/10.3390/xxxxxA cadem ic Editor: F irstn am e L astnam e R eceived: d ate A ccep ted : d ate Published : d ate Publisher's Note: M D P I stays neutral w ith reg ard to ju risd iction al claim s in published m aps and in stitution al affiliatio n s.Copyright: © 2022 by the authors.Subm itted for possible open access publication und er the term s and conditions of the C reative C om m ons s://creativ ecom m on s.org /licen se s/by/4.0/).

Figure 5 .
Figure 5. First page of the MDPI journal template as it was current in 2022, containing sample text, images, and embedded stylesheets.

Figure 6 .
Figure 6.Frequency of journal releases by MDPI.

Figure 6 .
Figure 6.Frequency of journal releases by MDPI.

Figure 11 .
Figure 11.Phylogram (E) for root RSID 00D77B5F.Most journals commenced in 2021.Titles marked with an asterisk (*) indicate journals where the 2023 template file was used.

Figure 11 .
Figure 11.Phylogram (E) for root RSID 00D77B5F.Most journals commenced in 2021.Titles marked with an asterisk (*) indicate journals where the 2023 template file was used.

Figure 13 .
Figure 13.End section of Phylogram E with document creation dates added.The publication of the journals Future and Journal of Vascular Diseases started in 2022, while the journals Air, Arthropoda, and Targets formally commenced in 2023.Clearly, however, preparations for this had commenced long before.

Figure 13 .
Figure 13.End section of Phylogram E with document creation dates added.

Table 1 .
File creation dates and times (GMT) as extracted from the core.xmlfile.

Table 2 .
File format of template files 2010-2023 available for selected journals founded in 2009 and 2010.

Table 3 .
Root RSID, total number of RSIDs listed in the RSID table, and file creation and modification dates (in italics) of the XML files of selected journals founded in 2009 and 2010.RSIDs (average 1416 RSIDs) than the templates from the 2021-2023 period (average 172 RSIDs).Looking at the file creation dates, numerous files were created on the same date (2016-05-03), suggesting the same QA processes were at work in 2016 as they were in 2022.