The Importance of Free and Open Source Software and Open Standards in Modern Scientific Publishing

: In this paper we outline the reasons why we believe a reliance on the use of proprietary computer software and proprietary file formats in scientific publication have negative implications for the conduct and reporting of science. There is increasing awareness and interest in the scientific community about the benefits offered by free and open source software. We discuss the present state of scientific publishing and the merits of advocating for a wider adoption of open standards in science, particularly where it concerns the publishing process.

It is near universal practice that scientific journals require authors to submit prepared manuscripts in proprietary file formats.By extension, this also means that as authors we are to some degree restricted to using proprietary software.The most commonly used formats by journals in the peer-review, editorial and publication processes are DOC/DOCX for written text and XLS/XLSX for graphs and tables [1].PPT/PPTX files are sometimes requested for graphs or embedded images [2].These formats OPEN ACCESS have a number of issues associated with them which ultimately make science less open, less transparent and the scientific authorship process less accessible.
Firstly, in order to read, edit and create documents in these formats with complete compatibility, we are required to license proprietary software at often great expense.This has a discriminative impact on researchers with modest means because they are forced to purchase proprietary software when free and open source software (FOSS) alternatives are widely available.While many FOSS alternatives such as LibreOffice do allow reading, editing and exporting into proprietary formats such as DOC/DOCX [1], the Microsoft Corporation has not allowed the release of full documentation on its formats for optimal compatibility [3].Indeed it would represent a potential threat to its business model to do so [4].
Thus the reverse engineering of proprietary formats has become necessary by FOSS developers to enable compatibility.While the results of reverse engineering have generally been acceptable, it is far from optimal.This pay-to-participate model in science is undesired and should not be a feature of an inclusive scientific publishing environment.Furthermore, because the Microsoft Corporation makes slight changes to its file formats with each version release, users become locked into a proprietary ecosystem of software upgrades [4].Not only does the company make these changes, it does so to (1) force users to buy new versions of the software and (2) create a moving target for the competition that must reverse engineer each new version, again increasing the chances that users will need to buy the commercial proprietary software rather than use the alternatives to remain compatible.Because of these often subtle changes to the software, the competition, at a minimum, will lag behind and, at worst case, never support certain versions/features.
Second, using proprietary formats can pose important security and confidentiality risks.Because MS-Word allows full macro-scripting it has become a common carrier for computer viruses [5].This means that embedded within an DOC/DOCX file can be a malicious computer program which runs without the recipient's permission each time they view the file on their computer.Also, due to the way in which MS-Word stores its version changes, it has been possible for recipients to see prior drafts of the sender's document that may contain confidential information [6].
Thirdly, storing important data in proprietary file formats puts that very data at risk of being lost.Computers and software have made the storage of data convenient and safer in many instances.However, data that is stored in a proprietary file format today, may not be readable in the future.The very programs which are used to record data and the file formats in which they are stored can become obsolete over time.Furthermore, because those same programs and file formats may be the property of a corporation, if the company goes bankrupt and the software is pulled from the market, data stored in these formats may be lost as well [7].If the source code for these programs and file formats were made available, such as is the case with FOSS under an appropriate license, a programmer could with some effort resurrect the original software to read and recover the data.Because of these reasons, the continued use of proprietary formats for archiving scientific data not only represents a hindrance to scientific openness and reproducibility, it could be harming the very conduct of science.

What Does Free and Open Source Mean?
The Free Software Foundation, which champions the use of free software, defines free software as respecting a user's freedom to run, copy, distribute, study, change and improve the software.The organization goes on to further state that "when users don't control the program, the program controls the users.The developer controls the (proprietary) program and through it controls the users" [8].Having access to the program's underlying source code is a precondition for the above.

The Benefits of FOSS and the OpenDocument Format
There is increasing interest in the scientific community of the benefits offered by FOSS and also increasingly "open source hardware" to make scientific tools [9,10].Free open source operating systems such as the GNU/Linux system and BSD variants offer open, stable and scalable features.These features include parallel computing [11,12], multi-core processing [13], and portability to small and embedded devices [14].FOSS operating systems run most of the world's web servers.They are responsible for high performance scientific computing at centers such as CERN [1,15], where mathematical simulations are carried out under Linux environments using open source tools such as GNU Octave and Scilab, largely supplanting the proprietary MATLAB [16].
Even the development process of FOSS resembles the peer-review process of scientific publishing.A software developer creates a piece of software, releases the source code to the community where other developers contribute improvements or voice their concerns over potential flaws.In this way security concerns or bugs in the software are generally fixed more rapidly than in proprietary environments [17].In an open system, even end users have the ability to audit the underlying code of FOSS and have the final say on what that software does on their computer at any given time.These same benefits extend to open file formats.
Open file formats, such as the OpenDocument (ODT) format, rely on the input of an international multi-disciplinary consortium of standards organizations, information technology firms and even governments [18].The creation of a usable international document standard that is open, free, backwards compatible and fully documented as to ensure legacy archival, is not only in the interest of data archivists, it is key to the conduct of science for the reasons outlined above.In Table 1 we provide a list of presently maintained and commonly used FOSS word processing packages, many of which use the ODT standard by default.In Table 2 we provide a list of FOSS graphing packages that use non-proprietary file formats by default.

Conclusions
This manuscript was prepared entirely using FOSS (Linux Mint, LibreOffice, Zotero).Unfortunately during the final preparation and submission for peer-review, we were required to export the final manuscript into the required DOC format.In a more open submission process, the final step of exporting to a proprietary file format, would have been prohibited for the reasons that we outline in this communication.It is hoped that through greater awareness of the current problem, that more journals might offer authors the ability to submit their works in a documented, ISO standard file format such as ODF which is accessible to all now and will be in the future.If science depends on openness and the collaborative pooling of ideas to solve big questions, then why should the very communication of scientific results be dictated by the use of closed corporate software models?In order to change the current paradigm, a critical mass of researchers have to be addressed via general science and engineering journals with the aim of informing them of the importance of using (and requesting) a wider adoption of formats in their work and publication.Publishers themselves also need to be addressed and made aware of the importance of requesting and even requiring open file-formats from authors.Newer versions of Microsoft Office will write to ODF and plug-ins are available for older versions.This makes a stronger case for publishers accepting ODF, since those who choose to use Microsoft products may continue to do so.This also prevents the opposite of what is being argued in this paper from occurring, that is locking out existing proprietary software users in favor of those using non-commercial tools.The ODF format is all encompassing, whereas Microsoft formats are not.

Table 1 .
Overview of a selection of presently maintained and commonly used free and open source software (FOSS) word processing packages.

Table 2 .
Overview of a selection of presently maintained and commonly used FOSS graphing applications.