Python Programming in PyPI for Translational Medicine

: This is the world’s ﬁrst tutorial article on Python Packaging for beginners and practitioners for translational medicine or medicine in general. This tutorial will allow researchers to demonstrate and showcase their tools on PyPI packages around the world. Nowadays, for translational medicine, researchers need to deal with big data. This paper describes how to build an executable Python Package Index (PyPI) code and package. PyPI is a repository of software for the Python programming language with 5,019,737 ﬁles and 544,359 users (programmers) as of 19 October 2021. First, programmers must understand how to scrape a dataset over the Internet; second, they must read the dataset ﬁle in csv format; third, build a program to compute the target values; fourth, convert the Python program to the PyPI package.; and ﬁfth, upload the PyPI package. This paper depicts a covidlag executable package as an example for calculating the accurate case fatality rate (CFR) and the lag time from infection to death. You can install the covidlag by pip terminal command and test it. This paper also introduces deathdaily and scorecovid packages on PyPI Stats, which can inform how many users have downloaded the speciﬁed PyPI package. The usefulness and applicability of a developed tool can be veriﬁed by PyPI Stats with the number of downloaded users.


Introduction
With the rapid progress of open-source software [1][2][3], many researchers have used a Python language for not only translational medicine but also other medical applications in general [4][5][6]. Python Package Index (PyPI) is an open-source repository of software for the Python programming language with 5,019,737 files and 544,359 users (programmers) as of 19 October 2021. The useful libraries and packages are available in public and can be installed by a simple pip terminal-line command.
It is important for researchers to share useful and applicable libraries with others. The pip terminal command allows us to install any PyPI library on your system with the following line command: pip install [xxx] where [xxx] is one of the PyPI libraries.
For example, OpenCV [6,7] is an image processing package where [xxx] should be replaced with opencv-python for installation by pip command. scikit-learn [4] is a collection of ensemble machine learning packages. Tensorflow [8] is a symbolic math library based on dataflow and differentiable programming on training and inference of deep neural networks. Keras [9] acts as an interface for the TensorFlow. PyTorch [10] is a machine learning library used for applications such as computer vision and natural language. Recently, PyTorch subsumed You Only Look Once [11] (YOLO-v5), which is an object detection library for computer vision and image processing that deals with detecting instances of semantic objects (humans, cars, dogs, and so on). MediaPipe is a library to offer cross-platform, customizable machine learning solutions for live and streaming media.
Four points play a key role in creating a new tool or a PyPI package: (1) data collection, (2) feature engineering tool, (3) data visualization method, and (4) data analysis functions.
(2) Using pandas library to extract daily deaths.
(3) matplotlib library for visualization. (4) Using numpy.poly1d of numpy library for curve-fitting function to predict the number of daily deaths.
This paper shows how a covidlag libray in PyPI package was built; it is an example of executable PyPI code [12]. The covidlag is a Python library and is executable on Linux, Windows, and MacOS for calculating the accurate case fatality rate and the lag time from infection to death. It should be noted that the covidlag tool has not yet been peer-reviewed.
In order to build a PyPI package, you must understand basic shell commands (bash or zsh) and be familiar with them [13]. For example, popular line commands are "ls", "cd", "pwd", "grep", "echo", "sed", "cut", "awk", "sort", "uniq", "less", "wc", "df", "cat", "which", ">", "|", and so on. You should be familiar with a tree structure of files and directories (folders) and a concept of PATH. PATH variable plays a key role in command execution. The directories in the PATH variable are searched in the order they are given. In other words, PATH variable determines the order of command executions.
In the covidlag program, the following libraries are used: pandas, numpy, matplotlib, sys, subprocess, sklearn (scikit-learn), and scipy. A programming code is a collection of math expressions (functions) without any ambiguity.
In order to use a Python language, you must choose a proper installation package, such as miniconda, depending on your operating system from the following site: https: //docs.conda.io/en/latest/miniconda.html (accessed on 23 November 2021).
For Windows users, you have two options of miniconda: one on Windows 10 and the other on Windows Subsystem for Linux (WSL). WSL is a compatibility layer for running Linux binary executables (in ELF format) natively on Windows 10 and Windows 11. WSL has not been completed yet, but you are allowed to use binary executables on Windows from WSL command line.
"$" sign is the shell prompt, which is not the user's input command. In covidlag, wget command is needed. covidlag was selected for this tutorial because the current computation methods on case fatality rate (CFR) have problems. CFR is computed by dividing the number of deaths due to COVID-19 by the number of cases or infected individuals for retrospective observational study. In other words, CFR is the ratio of the number of deaths divided by the number of confirmed cases of disease.
In the conventional methods, the number of sampled days and its range (start and end dates) have to be manually determined in order to calculate a value of the CFR. The covidlag in this tutorial automatically calculates the CFR by exploiting maxima and minima points with strong correlation between infection and death. To generate maxima and minima points, scipy library is used.
In open-source programming, new programs are simply glued together, combining existing libraries.
In other words, skills in open-source programming lie in selecting good libraries from a variety of the existing libraries. The more examples that are available in open-source libraries, the easier it is for users to create the desired code.

Testing Python Environment
It is assumed that Python is ready to run on the terminal. We must make sure that the system has a pip command in the PATH variable by the following command.
$ which pip Type the following command to install covidlag. $ pip install covidlag You may have several errors from the installation. Remember that pip command is not a fully automated command so you may need to install the following libraries: pandas, numpy, matplotlib, scikit-learn, and scipy before installing covidlag. Error messages can inform you what libraries are missing in the current Python environment.
If you are ready to run covidag, make sure you are running X-Server and type the following command: $ covidlag Japan 500 13 This is for calculating the case fatality rate (CFR) and the lag time in Japan using 500 days from 18 October 2021 with 13th degree polynomial regression curve-fitting. The single graph should pop up on the screen as shown in Figure 1. Figure 1 shows the number of daily cases (infected) and the number of daily deaths by dimmed black colored lines. The graph has the number of daily cases on the left Y-axis and the number of daily deaths on the right Y-axis. The horizontal axis is 500 days from 18 October 2021. A blue-colored line indicates daily cases while red line indicates daily deaths. Red points show maxima, blue points for minima. The legend contains the values for calculating the CFR and the lag time from infection to death using maxima.

Testing Python Environment
It is assumed that Python is ready to run on the terminal. We must make sure that the system has a pip command in the PATH variable by the following command.
$ which pip Type the following command to install covidlag.

$ pip install covidlag
You may have several errors from the installation. Remember that pip command is not a fully automated command so you may need to install the following libraries: pandas, numpy, matplotlib, scikit-learn, and scipy before installing covidlag. Error messages can inform you what libraries are missing in the current Python environment.
If you are ready to run covidag, make sure you are running X-Server and type the following command: $ covidlag Japan 500 13 This is for calculating the case fatality rate (CFR) and the lag time in Japan using 500 days from 18 October 2021 with 13th degree polynomial regression curve-fitting. The single graph should pop up on the screen as shown in Figure 1. Figure 1 shows the number of daily cases (infected) and the number of daily deaths by dimmed black colored lines. The graph has the number of daily cases on the left Y-axis and the number of daily deaths on the right Y-axis. The horizontal axis is 500 days from 18 October 2021. A blue-colored line indicates daily cases while red line indicates daily deaths. Red points show maxima, blue points for minima. The legend contains the values for calculating the CFR and the lag time from infection to death using maxima. In addition to the graph, the following information is generated on terminal for maxima and minima: maxima information death peak: 2020-07-06 death peak: 2020-09-15 death peak: 2021-01-20 death peak: 2021-05-25 In addition to the graph, the following information is generated on terminal for maxima and minima: maxima information death peak: 2020-07-06 death peak: 2020-09-15 death peak: 2021-01-20 death peak: 2021-05-25 death peak: 2021-09-12 case peak: 2020-07-08 case peak: 2020-09- The current CFR is used for retrospective studies. For CFR computation, the length of days and the start and end dates are required in the conventional methods. The covidlag can automatically generate values for CFRs and lag times from a raw dataset.
2.2.1. README.md README.md file can be easily prepared by using github site. You need to have an account on github site. When creating a new Repository, select "add a README file". README.md will be created when you enter the necessary content of a new PyPI package. Remember that the image in Github should be linked to the global site image address instead of the local address. Unless the image is linked to the global address link, the image will not be displayed on the PyPI site.

setup.py
The following is a template of setup.py file for creating an executable code. Shaded texts should be changed for your PyPI package.
The directory and files are as follows: . The following command can upload three files. The system will ask for user name and password.
$ twine upload dist/* When you want to update the package, you must delete all files and directories in dist/* and build/* by the following command.

PyPI Package
PyPI Stats aims to provide aggregate download information on python packages available from the Python Package Index in lieu of having to execute queries against raw download records in Google BigQuery.
In order to use PyPI Stats, you have to install it. $ pip install -U pypistats To run it, type the following command: $ pypistats overall covidlag The following command can upload three files. The system will ask for user name and password.
$ twine upload dist/* When you want to update the package, you must delete all files and directories in dist/* and build/* by the following command.

PyPI Package
PyPI Stats aims to provide aggregate download information on python packages available from the Python Package Index in lieu of having to execute queries against raw download records in Google BigQuery.
The author created deathdaily for predicting the number of deaths in the next 7 days and scorecovid [14] for scoring individual policies against COVID-19.
The deathdaily is based on curve-fitting function. In scorecovid, score is based on the number of deaths per population (millions). Policymakers should use the deathdaily tool as a good indicator whether their polices should be strengthened or mitigated.
The scorecovid allows policymakers to learn good strategies from countries with excellent scores. Digital fences play a key role in mitigating the pandemic [15]. deathdaily has been downloaded by 12454 users worldwide while scorecovid with 7659 users as of 23 November 2021.
The described covidlag, deathdaily, and scorecovid are useful tools for analyzing the COVID-19 pandemic.

Discussion
This tutorial allows researchers to submit a new PyPI package and to showcase their tools on PyPI packages around the world. All we need to do is to create five files including xxx.py, setup.py, README.md, __main__.py, and __init__.py by following instructions in Materials and Methods Section. Before submitting the new package, you should test it on your local machine.
When comparing the result of Japan with that of the United States (Figure 2), it is clear to the observer the difference between Japan and the US on CFR.
number of deaths per population (millions). Policymakers should use the deathdaily tool as a good indicator whether their polices should be strengthened or mitigated.
The scorecovid allows policymakers to learn good strategies from countries with excellent scores. Digital fences play a key role in mitigating the pandemic [15]. deathdaily has been downloaded by 12454 users worldwide while scorecovid with 7659 users as of 23 November 2021.
The described covidlag, deathdaily, and scorecovid are useful tools for analyzing the COVID-19 pandemic.

Discussion
This tutorial allows researchers to submit a new PyPI package and to showcase their tools on PyPI packages around the world. All we need to do is to create five files including xxx.py, setup.py, README.md, __main__.py, and __init__.py by following instructions in Materials and Methods Section. Before submitting the new package, you should test it on your local machine.
When comparing the result of Japan with that of the United States (Figure 2), it is clear to the observer the difference between Japan and the US on CFR.
$ covidlag 'United States' 600 13 The latest CFR of the US is CFR = 2090/170056 = 0.0123, which is four times higher than that in Japan. The lag time of the US is 13 days, which is shorter than that in Japan.

Deathdaily.py and Scorecovid.py
This is the world's first tutorial on PyPI packaging for translational medicine or medicine in general. This tutorial allows researchers to build a new PyPI package and to show- The latest CFR of the US is CFR = 2090/170056 = 0.0123, which is four times higher than that in Japan. The lag time of the US is 13 days, which is shorter than that in Japan.

Deathdaily.py and Scorecovid.py
This is the world's first tutorial on PyPI packaging for translational medicine or medicine in general. This tutorial allows researchers to build a new PyPI package and to showcase their tools on PyPI packages around the world. covidlag.py, setup.py, and README.md were detailed in this paper. deathdaily and scorecovid were also introduced where their source codes are available at the following sites: https://github.com/ytakefuji/ score-covid-19-policy/raw/main/scorecovid.py (accessed on 23 November 2021); https: //github.com/ytakefuji/covid-19_daily_death_prediction/raw/main/deathdaily.py (accessed on 23 November 2021).

Conflicts of Interest:
The authors declare no conflict of interest.