A Trendline and Predictive Analysis of the First-Wave COVID-19 Infections in Malta

Following the first COVID-19 infected cases, Malta rapidly imposed strict lockdown measures, including restrictions on international travel, together with national social distancing measures, such as prohibition of public gatherings and closure of workplaces. The study aimed to elucidate the effect of the intervention and relaxation of the social distancing measures upon the infection rate by means of a trendline analysis of the daily case data. In addition, the study derived a predictive model by fitting historical data of the SARS-CoV-2 positive cases within a two-parameter Weibull distribution, whilst incorporating swab-testing rates, to forecast the infection rate at minute computational expense. The trendline analysis portrayed the wave of infection to fit within a tri-phasic pattern, where the primary phase was imposed with social measure interventions. Following the relaxation of public measures, the two latter phases transpired, where the two peaks resolved without further escalation of national measures. The derived forecasting model attained accurate predictions of the daily infected cases, attaining a high goodness-of-fit, utilising uncensored government-official infection-rate and swabbing-rate data within the first COVID-19 wave in Malta.


Introduction
Upon the identification of a novel coronavirus in China in January 2020, infections caused by the acute respiratory syndrome coronavirus 2 (SARS-CoV-2) rapidly spread worldwide [1]. Coronavirus disease 2019 (COVID- 19) cases have been reported to be highly infectious and carry substantial morbidity and mortality [2]. Despite precautionary measures promoted by societal institutions, the impact of COVID-19 on healthcare systems has been significant, overwhelming hospital bed capacity and intensive care resources [3][4][5]. In an effort to 'flatten the curve', governments have implemented varying public health measures, which, at the most extreme, have involved lockdowns of large cities and countries [6,7]. These measures, however, have incurred substantial costs, and required large-scale logistical efforts to achieve an effective balance between the protection of human health and the national economy [8]. Studies have indicated that control measures have been effective [9,10], with a full lockdown imposition being the most effectual manner to mitigate the spread of the virus [11]. Yet, as the categories of control measures imposed by a government may vary substantially, very distinct cohorts of the population can truly be affected. As a result, analysing the virus spread in relation to discretised imposed measures has been a constraint in epidemiological analyses [12,13].
In order to constitute data-driven decisions, forecasting models by means of distribution fitting are commonly utilised to implement interventions within speculated timespans. These models have provided quantitative infection case projections that allow policymakers to plan appropriate interventions [14]. Of particular importance is short-term hospital and swab-testing data in Malta, to be solved on commonly utilised office machines and spreadsheet software for data-driven logistics insights and decision-making.
Malta is a Mediterranean island country with a population of approximately 500,000 inhabitants. Its first SARS-CoV-2 isolate was reported on 7 March 2020. Following the initial case, the country experienced an epidemic until the end of July, with a peak in mid-April, in what was the first wave of COVID-19 in the country. Various interventions were introduced by the government to address the challenge, with international and national travel bans being the most substantial. A fundamental component of the national strategy was an exhaustive track-and-trace strategy, which reached a frequency of almost 2000 swabs per 100,000 residents per week by the end of the first wave. The daily number of new COVID-19 cases, together with the respective number of swab-tests taken, were retrieved from the official website of the Ministry of Health in Malta issued from 21 February 2020 onwards. The key interventions introduced by the Maltese government throughout the first wave were also identified, together with the dates when these interventions were implemented and discontinued. These are summarised in Table 1. It should be noted that, as foreign and local travel had ceased within the island, together with an effective track-and-trace practice, the virus spread data were deemed to be solely related to national cases. In an effort to establish the trend by which the cumulative number of infections varied over the first pandemic wave in Malta, a trendline analysis was implemented. Primarily, a linear function (Equation (1)), was utilised upon the case data. Subsequently, logistic functions, based upon the cumulative distribution functions of the Exponential distribution (Equation (2)) and the Weibull distribution (Equations (3)-(5)), were implemented. The Weibull-based logistic function was developed further such that three distinct functions were coupled, establishing a multi-logistic function. (3) ) k c3 (5) where N c is the cumulative number of positively diagnosed cases, n is the day number, M c1 , M c2 , M c3 are the case-dependent magnitude parameters, λ c1 , λ c2 , λ c3 are the casedependent scale parameters, and k c1 , k c2 , k c3 are the case-dependent shape parameters. In relation to the cumulative infected cases, a daily infected case trendline analysis was undertaken by implementing the arithmetic time-derivative of the cumulative functions (Equations (6)-(10)).
) k c3 (10) where . N c is the number of positive cases per day. In a similar manner, the cumulative number of swab-tests over the first wave was analysed by utilising a linear function (Equation (11)) and a Weibull-based logistic function (Equation (12)).
where N s is the cumulative number of swab-tests, M s1 is the swab-test-dependent magnitude parameter, λ s1 is the swab-test-dependent scale parameter, and k s1 is the swab-testdependent shape parameter. In relation to the cumulative swab-test number, a daily swab-test trendline analysis was undertaken by implementing the arithmetic time-derivative of the cumulative functions (Equations (13) and (14)). .
where . N s is the number of swab-tests per day.

Predictive Analysis
A prediction model was derived by utilising a logarithmic growth rate equation for the daily diagnosed cases (Equation (15)).
where N c n is the cumulative number of positively diagnosed cases on a given day, n is the considered day, and K c n is the daily positive-case logarithmic growth rate. From the formula, the necessary model output was N c n , and hence, K c n was required to be determined. To statistically predict K c n , a two-parameter Weibull distribution fit was implemented upon the logarithmic growth rate data of the previous days to establish the Weibull scale and shape parameters. The parameters were rolling parameters as the process was done daily throughout the time period. The scale and shape parameters were then utilised within an inverse cumulative distribution in the form of a quantile function for the Weibull distribution to establish a K c n range (Equation (16)).
where p is the occurrence probability, λ c is the case-dependent Weibull scale parameter, and k c is the case-dependent Weibull shape parameter.
In addition, to account for the variation in daily swab-testing, and overcome the assumption of a constant swab-testing-rate, a swab-test factor (c s ) was introduced and coupled with K c n (Equation (17)).
The swab-test coefficient was established to be the ratio between the swab-test logarithmic growth rate on day n and the average swab-test logarithmic growth rate of the considered prior days (Equation (18)).
where K s n is the swab-test daily logarithmic growth rate, t is the number of considered prior days, and N s n is the cumulative number of swab-tests on a given day. By incorporating the prior equations, the predictive model was derived to establish the daily number of infected cases over a time period (Equations (19)- (21)).

Positive Infected-Case Function
Applying Equations (1)-(5) to the cumulative dataset attained the data-driven variables detailed in Table 2. The linear function attained the least similarity, whereas the triple Weibull-based logistic function attained the highest similarity, with coefficients of determination (R 2 ) of 0.860 and 0.998, respectively. The functions were graphically superimposed upon the dataset, as illustrated in Figure 1. (2)) (c) Logistic (Equation (3)) (d) Logistic (Equation (4)) (e) Logistic (Equation (5)) Implementing Equations (6)-(10) to the daily infected cases dataset attained the datadriven variables detailed in Table 3. The linear-derivative function attained the least similarity, Equation (2) 885.6 78.32 Implementing Equations (6)-(10) to the daily infected cases dataset attained the datadriven variables detailed in Table 3. The linear-derivative function attained the least similarity, whereas the triple Weibull-based logistic-derivative function attained the highest similarity, with coefficients of determination (R 2 ) of 0.0 and 0.566, respectively. The functions were graphically superimposed upon the dataset, as illustrated in Figure 2. Furthermore, as the triple Weibull-based logistic-derivative function attained the highest similarity, portraying a high correspondence, the key dates on which the nationally imposed measures were enforced or relaxed (see Table 1) were additionally incorporated, as illustrated in Figure 3. Table 3. Daily infected cases trendline data.

Swab-Test Function
Applying Equations (11) and (12) to the cumulative swab dataset attained the datadriven variables detailed in Table 4. The linear function attained the least similarity, whereas the Weibull-based logistic function attained the highest similarity, with coefficients of determination (R 2 ) of 0.901 and 0.999, respectively. The functions were graphically superimposed upon the dataset, as illustrated in Figure 4.
Implementing Equations (13) and (14) to the daily swab count dataset attained the data-driven variables detailed in Table 5. The linear-derivative function attained the least similarity, whereas the Weibull-based logistic-derivative function attained the highest similarity, with coefficients of determination (R 2 ) of 0.0 and 0.762, respectively. The functions were graphically superimposed upon the dataset, as illustrated in Figure 5. Furthermore, the infection-rate positivity ratio was established by coupling Equations (10) and (14), illustrated in Figure 6. The key dates on which the nationally imposed measures were enforced or relaxed (see Table 1) were additionally incorporated.

Swab-Test Function
Applying Equations (11) and (12) to the cumulative swab dataset attained the datadriven variables detailed in Table 4. The linear function attained the least similarity, whereas the Weibull-based logistic function attained the highest similarity, with coefficients of determination ( 2 ) of 0.901 and 0.999, respectively. The functions were graphically superimposed upon the dataset, as illustrated in Figure 4.  (11)) (b) Logistic (Equation (12)) Implementing Equations (13) and (14) to the daily swab count dataset attained the data-driven variables detailed in Table 5. The linear-derivative function attained the least similarity, whereas the Weibull-based logistic-derivative function attained the highest similarity, with coefficients of determination ( 2 ) of 0.0 and 0.762, respectively. The func-   (13)) (b) Logistic (Equation (14))

Predictive Analysis
The forecasting model was applied for the entire first wave (7 March-15 July 2020), and a portion of the second wave (16 July-31 August 2020), to establish the continuity capacity of the model. Employed within a one-day ( Figure 7) to a five-day ( Figures A1-A8) prediction framework, the statistical model attained good agreement with the dataset, achieving a global coefficient of determination ( 2 ) of 0.9995 to 0.9955 between the statistical model median outputs and the actual dataset. Good agreement was also attained for predictions beyond five days. Particularly for the one-day, two-day, and three-day forecasting, solely 4.5%, 7.8%, and 12.3% of the data-points fell outside of the 0th-95th percentile prediction band, respectively. The explicit statistical modelling methodology was therefore deemed to have been validated to a high degree of accuracy.   (13)) (b) Logistic (Equation (14))

Predictive Analysis
The forecasting model was applied for the entire first wave (7 March-15 July 2020), and a portion of the second wave (16 July-31 August 2020), to establish the continuity capacity of the model. Employed within a one-day ( Figure 7) to a five-day ( Figures A1-A8) prediction framework, the statistical model attained good agreement with the dataset, achieving a global coefficient of determination ( 2 ) of 0.9995 to 0.9955 between the statistical model median outputs and the actual dataset. Good agreement was also attained for predictions beyond five days. Particularly for the one-day, two-day, and three-day forecasting, solely 4.5%, 7.8%, and 12.3% of the data-points fell outside of the 0th-95th percentile prediction band, respectively. The explicit statistical modelling methodology was therefore deemed to have been validated to a high degree of accuracy.

Predictive Analysis
The forecasting model was applied for the entire first wave (7 March-15 July 2020), and a portion of the second wave (16 July-31 August 2020), to establish the continuity capacity of the model. Employed within a one-day ( Figure 7) to a five-day ( Figures A1-A8) prediction framework, the statistical model attained good agreement with the dataset, achieving a global coefficient of determination (R 2 ) of 0.9995 to 0.9955 between the statistical model median outputs and the actual dataset. Good agreement was also attained for predictions beyond five days. Particularly for the one-day, two-day, and three-day forecasting, solely 4.5%, 7.8%, and 12.3% of the data-points fell outside of the 0th-95th percentile prediction band, respectively. The explicit statistical modelling methodology was therefore deemed to have been validated to a high degree of accuracy.

Discussion
Implementing a trendline analysis upon the daily case dataset, as opposed to a sole moving average, permitted superimposing the dates of enforcement and relaxation measures to qualitatively shed light on the effect of the measures along the trend [23]. The infection rate trend increased exponentially throughout the initial days of the pandemic, prior to social measures. Upon the implementation of enforcement measures related to public transport and closure of workplaces, sports facilities, law courts, religious places, and service outlets, the rate-of-change of the daily cases diminished. This acknowledged the effect of the social-distancing enforcement measures, inhibiting the spread of the virus. The subsequent enforcement measures related to the prohibition of public gatherings, and closure of education estab-

Discussion
Implementing a trendline analysis upon the daily case dataset, as opposed to a sole moving average, permitted superimposing the dates of enforcement and relaxation measures to qualitatively shed light on the effect of the measures along the trend [23]. The infection rate trend increased exponentially throughout the initial days of the pandemic, prior to social measures. Upon the implementation of enforcement measures related to public transport and closure of workplaces, sports facilities, law courts, religious places, and service outlets, the rate-of-change of the daily cases diminished. This acknowledged the effect of the social-distancing enforcement measures, inhibiting the spread of the virus. The subsequent enforcement measures related to the prohibition of public gatherings, and closure of education establishments and non-essential retail/service outlets further decreased the rate-of-change steadily, attaining a peak average infection rate of 11.5 cases per day on 31 March.
The relaxation of public transport measures, re-opening of non-essential retail outlets, and increased size of public gatherings (to a maximum of three persons) on 4 May resulted in the second phase, occurring with an exponential increase in infection rate. The rate, however, peaked rapidly and diminished to a low value within a short timeframe of six days. The third phase was evident following relaxation measures of public gatherings (to a maximum of four persons) and re-opening of service outlets and public places on 22 May. This had a lower rate-of-change than the second phase and rapidly levelled off by 15 July.
The initiation of the second and third phases was consistent with the relaxation of public gathering social distancing measures [24]. As higher numbers of persons were permitted to be in close contact, the possibility of virus transmission increased. Nevertheless, the second and third peaks were much lower and a substantial drop in infection-rate number followed with no new enforcement measures implemented. The reason for this may be attributed to societal diligence in re-orienting itself to effective social distancing [25]. The wearing of masks or visors in shops and on public transport had become obligatory by law on 4 May and may have contributed to the achievement of control in the second and third peaks [26]. A more likely hypothesis, however, may be the effectiveness of test-and-trace that, by this time, had been expanded and consolidated [27]. Approximately 110,000 swabtests were performed within the first wave, where the peak median swab-tests per day was found to be approximately 1200 on 18 May 2020. Along the second phase time-period, the mean swabbing frequency had increased to over 9000 swabs per week, with peak median positivity ratios of 2.34, 0.95, and 0.34 cases per hundred swabs identified within the first, second, and third phases, respectively. Every positive case was isolated within 24 h of testing with concurrent quarantine of significant contacts. This was possible given the low positivity rate during the second and third phases, which was below 1%. By means of the trendline, it may be argued that, as testing was increasing, had the relaxation measure of 4 May been implemented two weeks later, it would have allowed the infection-rate to level off. Accordingly, the relaxation measure of 1 July would potentially have been implemented earlier.
With regard to the forecasting model, the derivation of an explicit statistical model based upon the logarithmic growth rate was found to be an accurate and computationally feasible methodology, achieving a global R 2 of over 0.99 and a total computation time of less than five seconds on a typical office machine. This methodology attained the logical implementation of solving for the accumulative number of cases on a particular day (N n ) by determining the logarithmic growth rate (K n ) via a statistical analysis utilising an inverse cumulative distribution (quantile) function based on a Weibull distribution, together with incorporating a swab-test coefficient (c n ) to account for the correlation between tests undertaken and positively infected cases.
Incorporating a Weibull distribution fit was advantageous due to its adaptability, permitting the comprehension of both symmetric and non-symmetric distributions whilst interpolating between the exponential distribution and the Rayleigh distribution via a two-parameter implementation. The Weibull cumulative distribution function put forward an explicit function, encompassing two parameters that can be estimated from a dataset, and hence efficiently solvable, in contrast to the application of implicit formulation functionality [28]. This aspect was deemed imperative as high-end statistical approaches tend to lie beyond the statistical knowledge-capacity of medical professionals and the processing power of office machines and commonly utilised spreadsheet software [29,30].
In addition, this model permitted the utilisation of historical data of positive cases and swab-tests, rather than implementing trends that disregard swab-testing correlation. This model may distinctively be utilised for short-to-medium term quantitative risk assessments. Furthermore, week-long facility logistic decisions, such as the number of beds in emergency wards, ventilators, and on-call staff personnel, may be substantiated utilising the model. By collating and discussing the different statistical modelling and prediction techniques for COVID-19, Yadav and Akhter [31] pinpointed the significance of utilising a single distribution to fit and represent the true virus spread, such that effective data-driven policies may be made. The implemented two-parameter Weibull distribution methodology implemented within this work succeeded in achieving this critical indication. The model, however, is limited by the diminishing capacity of the distribution if left-censored data are utilised [32,33]. Moreover, when furthering the forecasting to a larger timespan, the deviation from the true result may increase substantially as a result of the fundamental approach [34]. Nonetheless, Weibull analyses are typically utilised for medical statistics as the methodology has been acknowledged to sustain accuracy despite an extremely small dataset [35]. In fact, within the context of SARS-CoV-2 statistical analyses, this approach has been notably applied to establish the incubation period of the virus [36,37]. As a result, a rudimentary yet accurate forecasting model was established, encompassing the imperative capacity of accurately predicting positive cases within the termination of one wave and the initiation of another.

Conclusions
This study presented a novel statistical model, incorporating swab-testing rates coupled with Weibull-distributed historical data of the SARS-CoV-2 positive-case logarithmic growth rate, to predict the virus infection rate and establish an accurate projection through a numerically explicit framework. The model was validated utilising infection rate data within Malta. Furthermore, an epidemiological elaboration of infection trends was established utilising trendline analyses for the purpose of evaluating social distancing enforcement and relaxation measures upon the virus spread within the population.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Appendix A Figure A1. Two-day infected-case predictive output (first wave). Figure A1. Two-day infected-case predictive output (first wave).