The Downside of Upkeep: Analysing Railway Infrastructure Maintenance Impact on Train Operations in Sweden

: Efficient and seamless railway operations depend on the systematic and well-coordinated maintenance of both rolling stock and infrastructure. However, track maintenance, or ‘trackwork’, can cause substantial delays if not properly aligned with train schedules. This study comprehensively investigates how trackwork influences train operations in Sweden. It involves an in-depth analysis of an extensive dataset comprising over 225,000 recorded instances of planned trackwork and approximately 32.5 million train passages throughout the year 2017. Multiple logistic and negative binomial regression models showed that train running time delay occurrence is higher in the sections with scheduled trackwork. Trains passing through trackwork are 1.43 times more likely to experience delays compared to trains that do not pass through scheduled trackwork. The likelihood of an opportunity for the train delay recovery passing the section with scheduled trackwork is reduced by 11%. Additionally, the frequency of train delay increase is 16% higher, and delayed recovery is 4% lower in relation to trackwork. With the number of trackwork set to increase over the coming years, these results bring attention to train scheduling and the performance of trackwork.


Introduction
Ensuring the reliability of railway operations is crucial, especially with an anticipated shift of more traffic to rail [1].With the increase in train traffic, the wear and tear on railway infrastructure components intensify, necessitating regular track maintenance [2].Trackwork refers to the maintenance or renewal of railway infrastructure components that require planned temporary capacity restrictions for the section on the track where the activity is taking place.Such limitations can include complete track closures, reduced speed limits, or switching to single-track operations [3,4].These restrictions might lead to train delays, thus often requiring adjustments in train schedules [4,5].To address this issue, substantial research has been conducted in the field of maintenance optimisation and train operations [6][7][8][9].
Punctuality is vital for the competitiveness of railway services, as delays can severely compromise the quality of service for both passenger and freight railway operations [10][11][12].Punctuality and delay refer to trains running either at or behind the scheduled arrival time.In Sweden, punctuality is assessed in terms of the percentage of trains arriving at the final destination within 5 min of the scheduled time.While punctuality is the metric that is most commonly used to evaluate the performance of railway operations, it is a result of delays that have occurred throughout the journey.Delay is a measurement (in minutes) of a negative deviation from the train timetable [12,13].Running time delay is measured as the time difference between the scheduled and actual train travel time between stations.Another important aspect linked to train punctuality is delay recovery, defined as a delay time reduction [14].

Railway Infrastructure Maintenance in Sweden
The Swedish Transport Administration, as an infrastructure manager, is responsible for the maintenance and renewals of railway infrastructure in Sweden [27].The maintenance is delegated to five main maintenance companies and over 1000 subcontractors, governed by 34 different contracts.In line with current regulations, the maintenance contractors must conduct operational planning and request railway capacity [28], initiating applications 12 weeks before the scheduled trackwork and finalising them at least four weeks in advance.A detailed description of this process can be found in [29].Once these applications are authorised, they are recorded in the track utilisation plan.If trackwork is not aligned with train schedules during the annual capacity allocation, it raises the likelihood of train disruptions [30].
Trackwork that is performed frequently to preserve the condition of the infrastructure usually lasts for less than 24 h.In Sweden, this regular maintenance is referred to as "basic maintenance", which includes inspections, snow removal, switch lubrication, maintenance at level crossings, signal repair, tamping of tracks, and turnouts [7].This paper focuses on basic infrastructure maintenance, which does not lead to prolonged track closures but implies certain operational restrictions for train traffic.
The trackwork schedule is documented in the track utilisation plan, a digital record of all maintenance activities kept by the Swedish Transport Administration.There is an absence of systematic digital records concerning the actual execution of the scheduled trackwork.While dispatchers do maintain logs of conducted trackwork, these records are traditionally consigned to logbooks and have not yet been systematically transcribed into a digital format.Therefore, the present study is predicated upon the data available from the scheduled trackwork as outlined in the track utilisation plan.
As highlighted by [31], the current Swedish train planning system lacks established guidelines governing single-track operations during maintenance activities.Consequently, there is a minimal expectation for timetables to be meticulously adjusted in line with scheduled trackwork.Moreover, given the substantial volume of trackwork, we do not expect operators to cancel a majority of trains.Nevertheless, during instances of extensive closures, operators possess the requisite capacity to either cancel or reroute trains as necessary.This scenario underscores the significance of analysing the impact of planned maintenance on train operations.

Study Objectives
This research focuses on assessing the impact of trackwork on train delays.It analyses Swedish data, including over 225,000 scheduled track maintenance events and approximately 32.6 million train passages throughout the country in 2017.This study is designed to answer the following research questions: (1) To what extent does trackwork influence the probability and frequency of train delays in Sweden?(2) How does the scheduled trackwork affect a delay recovery opportunity of the train?While this paper focuses on the Swedish railway system, we believe our findings apply in the European Union, as the railway capacity allocation process follows the same regulations [32].

Methodology
This section outlines our methodology to assess the impact of trackwork on train delays in Sweden, using two regression analyses: multiple logistic regression and negative binomial regression.The section begins by presenting the datasets obtained from the Swedish Transport Administration covering the Swedish railway network in 2017 [33].The data preparation process involves combining and structuring this data to make it suitable for regression analysis.Following this, we describe our use of multiple logistic regression to analyse the probability of train delays in relation to trackwork and other factors.Then, we explain the application of negative binomial regression to examine the frequency of these delays.Both methodologies are chosen for their effectiveness in handling the complex nature of our dataset and their relevance to railway operations analysis.

Overview of Data
The first dataset comprises the trackwork records from the track utilisation plan, detailing 225,507 instances of scheduled trackwork.Each record provides specific information about the scheduled time, location, and the restrictions imposed on train traffic due to maintenance.Our study focused on basic maintenance trackwork, which is characterised by the absence of full track closures and a duration of less than 24 hours.
In the track utilisation plan, locations of trackwork are identified by unique signal numbers situated along the track segments that span between two designated stations, marked as Ss and Se in Figure 1.Out of the 225,507 trackwork activities listed for 2017, we identified 3218 distinct track segments, which may include up to nine intermediary stations.Within these segments, the plan records a set of smaller trackwork that is performed at the same time in the same area.To streamline our dataset, we merged overlapping activities into single records, thereby eliminating duplication and simplifying the dataset for analysis.As a result, adjacent trackwork events, such as those depicted in Figure 1 as Ss.1-Sn.1 and Sn.2-Se.2, were combined into consolidated entries, labelled as trackwork 1-2 in the figure.The second dataset comprises the train punctuality data, extracted from the train plan 2017.This dataset provides information about the scheduled departure/arrival time and actual departure/arrival time to each station on the assigned train path, with a time precision of one minute.It includes specific details for each train route, such as a unique identification number, the type of train, and the type of track (whether single, double, or quadruple).In total, this dataset captures 32,591,482 train observations (Figure 2).Each recorded train passage is captured as a sequence of stations along its route, providing a precise geographical profile compared to the trackwork dataset (Figure 1).To integrate the datasets, we matched each unique journey in the punctuality records with corresponding track segments between the start (Ss) and end (Se) stations on the route.Given that trains traversed numerous segments or bypassed them entirely on their routes, 32.6 million recorded journeys throughout 3218 designated segments comprised roughly 27.2 million distinct train passages (Figure 2).Following this, we prepared the datasets for analysis with two regression models: multiple logistic regression and negative binomial regression.For the logistic regression, we defined two additional variables to capture both the presence and absence of train running time (runtime) delay increases, without altering the overall number of observations (Figure 2).In contrast, for the negative binomial regression, we aggregated the data based on a unique mix of train type, track type, trackwork, train entry status, daytime, and location.We then grouped the dataset with three new variables to quantify the counts of train running time delay increases, decreases, and instances where delays remained constant.

Train route
Table 1 shows a summary statistic of trackwork duration and train delay size.On average, the trackwork lasted for 181 min but had a large range and a standard deviation of 207 min.The running time delay was calculated as a difference between the scheduled and actual train running times between analysed stations.The measurements were conducted with a precision of up to 1 min.The mean value of the observed train running time delays is -0.15 min, and a standard deviation of 5.The range of delay times spans substantially, with the earliest arrivals recorded at minus 444 min, and the maximum value 1447 min.The analysed 27.2 million train passages have the following characteristics presented in Table 2.The count of the trains was evenly distributed over 12 months in the year 2017, with an average count of 2.3 million train passages per month.Table 2 shows the following characteristics of analysed train passages: train subtype, track type, running time delay, trackwork, train enter status, and day time.Each category of these variables is listed, along with the percentage of observations per category, and reports delay-increase observations within four thresholds (1-4 min, 5-9 min, ≥10 min, and ≥1 min).Notably, among all categories, freight trains most frequently faced increases in running time delays.In contrast, when passing the analysed section, commuter trains were less prone to such delay increases.Instead, these commuter trains predominantly experienced reductions in running time delays during the period of study.Scheduled trackwork overlapped with about 0.4% of the train passages, whereas 99.6% of the passages did not pass through scheduled trackwork.10% of the train passages were on quadruple-track, 52% on double-track, and 39% on single-track.Our sample was composed of 81% passenger trains and 19% freight trains.In total, 29% of the train passages in our sample were ahead of schedule entering the analysed track section, and 43% were behind schedule.Interestingly, trains that entered the section ahead of schedule often encountered a subsequent increase in running time delay.Finally, 86% of the passages occurred in the daytime and 14% at night.Night-time was defined (according to the labour act of Sweden [34]) as the period between 22.00 and 06.00.The total count of observations in the sample is 27,182,178.

Regression Modelling
In this study, we analyse how train running time delay and delay recovery (attributed to delay decrease) are associated with trackwork.The control factors are train type and subtype (passenger or freight train, with subtypes of each) and train entry status (early, late, on time) to the analysed track segment.Track type and day time are control variables for the trackwork relevant to this study's context.We develop two types of regression models: (i) Multiple logistic regression to explore the probability of train running time delay, and (ii) negative binomial regression to explore the frequency of train running time delay affected by the presence of scheduled trackwork.In addition to the main models, which account for more or equal to a 1 min train running time delay, we have also performed a sensitivity analysis regarding different running time train delay thresholds, accounting for delays of more than 5 or 10 min.
Table 3 provides a comprehensive statistical summary of the response variables used in both the logistic and negative binomial regression models.For the logistic regression model, we consider running time delay increases and decreases of at least one minute, with the observations totalling 27,182,634.Within this model, the average instance of delay increases of at least one minute is noted as 0.22, with a standard deviation of 0.42.The mean for delay decreases of the same threshold is 0.45, reflecting a higher frequency of delay decreases with a standard deviation of 0.50.The sensitivity of the model to more substantial delays is also examined, with thresholds at five and ten minutes, revealing lower average instances, signifying fewer occurrences of longer delays.The negative binomial regression model is employed for count data, chosen due to the over-dispersion present in the delay counts.The variables for this model are aggregated counts by trackwork, track type, train subtype, train enter status, and day time and location (Figure 2), with a total of 406,563 observations.The response variable running time delay increase/decrease count is a count variable representing the number of increased/decreased delays in the running time for each train passage in the studied track segment.The count of running time delay increases of at least one minute shows an average of 15 with a standard deviation of 44, indicating variability in delay occurrences.For running time delay decreases of one minute or more, the mean count is 30, with a higher standard deviation of 100, suggesting a wider spread in the data.Sensitivity analysis for this model includes delay increases at five-and ten-minute thresholds, with 142 and 86 instances, respectively, reflecting a marked decline in counts as the delay duration increases.

Multiple Logistic Regression
We use a multiple logistic regression model to analyse the effect of trackwork, along with other explanatory variables, on the train running time delay increases (1)/decreases (2).Logistic regression is commonly used to study functional relationships between a categorical dependent variable and one or more independent variables [35,36].The response variable for the first model captures the presence and absence of train running time delay increase while passing an analysed track segment, coded as 1 and 0 accordingly.In the second model, the response variable reports the presence and absence of train delay decrease in the same circumstances coded as 1 and 0 accordingly.The multiple regression model predicts the train running time delay increase/decrease (Y) occurrence by the explanatory (x i ) variables described in Table 2.The summary of this model is presented in the equation: where: • Y is the response variable capturing the presence or absence of the train running time delay increase (1 min) for the first model and of running time delay decrease (1 min) for the second model, given the predictor variables.The possible values are 0 or 1; • X 1 , X 2 , . . ., X 5 are the predictor variables in the model (trackwork, track type, train subtype, train enter status, and day time, respectively); • β 0 is the intercept term, and β 1 , β 2 , . . ., β 5 are the coefficients for each predictor variable.
The explanatory variable trackwork is a binary variable where 1 is assigned to cases where the train passage on the studied track segment overlaps with scheduled trackwork; otherwise, it is: 0. Track type, train type, train enter status, and night are categorical explanatory variables representing the track type, train subtype, whether the train is on time, early, or late, and whether the train operates at night, respectively.The time variable shows when the train passed the analysed line day (0) or night (1).Pearson's chi-squared test was used to check the independence of qualitative variables entering the regression model.The results show that all tested variables were independent.The selection variables chosen for this model were made by testing several logistic models.
For ease of interpretation, in line with multiple logistic regression coefficients, we computed the odds ratio (OR).OR is a measure of association between a given exposure in a logistic regression and an outcome Y: The OR, therefore, indicates how much more likely the event is to happen given a particular exposure (in this case, trackwork) compared to its absence.An OR greater than 1 suggests a higher likelihood of the event when the exposure is present, whereas an OR less than 1 indicates a reduced likelihood.This measure is particularly useful in logistic regression as it provides a clear and interpretable metric of the strength and direction of the association between predictors and the outcome variable.

Negative Binomial Regression
We employed two negative binomial regression models to analyse the relationship between the count of train running time delay increases (1)/decreases (2) and a set of explanatory variables.The regression coefficients were estimated using the glm.nb function in R (2023.06.2).The equation for the model is as follows: where: • E[Y|X] is the expected count of running time delay increase (1 min) for the first model and of running time delay decrease (1 min) for the second model given the predictor variables; • X 1 , X 2 , . . ., X 5 are the predictor variables in the model (trackwork, track type, train subtype, train enter status, and night, respectively); • β 0 is the intercept term, and β 1 , β 2 , . . ., β 5 are the coefficients for each predictor variable.• log(ε i ) is the natural logarithm of the exposure variable for observation.For ease of interpretation, in line with the coefficients obtained from the negative binomial regression, we computed the incidence rate ratio (IRR) by taking the exponent of the estimated coefficients, which is expressed as IRR = e β i .This allows us to directly interpret the proportional change in the count of running time delay increases or decreases associated with a one-unit change in the predictor variable, with all other variables held constant.

Results
This paper conducts a comprehensive analysis of how trackwork impacts train running time delays, utilising two distinct types of regression models.Firstly, the multiple logistic regression model elucidates the probabilities of train delays in relation to scheduled trackwork, taking into account other predictor variables.Secondly, the negative binomial regression model sheds light on the frequency of delay occurrences, specifically focusing on the correlation between the presence of scheduled trackwork and the delays experienced by trains traversing these segments.

Train Running Time Delay Increase
We employed multiple logistic regression and negative binomial regression models to examine the correlation between increases/decreases in train running time delays and trackwork (Table 3), adjusting for a set of categorical independent variables (Table 2).Our sensitivity analysis focused on understanding the impact of train delay thresholds of 5 and 10 min on this association.We assessed the statistical significance of each coefficient using the Wald chi-square test.Comprehensive summaries of these models can be found in Appendix A, Tables A1 and A2.
The multiple logistic regression analysis presented in Table 4 reports the probability of train running time delays, categorised into delays of ≥1 min, ≥5 min, and ≥10 min, in relation to scheduled trackwork and other operational factors.The regression coefficients are significant at the 0.1% level, except for the airport train type and the impact of trackwork on delays of at least 10 min.For delays of at least 1 min, the model reveals an increase in the likelihood of delay (OR = 1.43) when trackwork is scheduled.This effect diminishes slightly for delays of 5 min or more (OR = 1.37), and becomes non-significant for substantial delays of at least 10 min (OR = 1.04).Track and train type play a considerable role in predicting delays.Quadruple tracks demonstrate a decreased probability of short and moderate delays but an increased likelihood of longer delays (OR = 1.28).Conversely, single tracks and commuter trains consistently correlate with higher odds across all delay thresholds.The analysis also indicates that unspecified passenger and high-speed trains are less likely to experience significant delays.Notably, late departures and night-time operations do not emerge as significant predictors of delay.
The negative binomial regression analysis, summarised in Table 5, investigates the frequency of train running time delays at thresholds of ≥1 min, ≥5 min, and ≥10 min, considering other explanatory variables (track type, train type, train departure status, and day time).For delays ≥1 min, the presence of trackwork slightly increases the frequency of delays (IRR = 1.16).This effect is marginally more pronounced for delays ≥5 min (IRR = 1.20) but becomes non-significant for substantial delays of ≥10 min (IRR = 0.98).The track type shows a differential impact, with quadruple tracks slightly reducing the frequency of shorter delays (IRR = 0.95) but increasing for longer delays (IRR = 0.72).Single tracks and unspecified passenger train types tend to increase the frequency of delays across all thresholds.
The analysis reveals significant variability across different train types in influencing train running time delay increase occurrences.For instance, intercity and regional trains consistently show a decreased frequency of delays across all delay size thresholds for intercity trains for delays ≥1 min).In contrast, although airport trains have a non-significant impact on the shortest delays, they considerably increase the likelihood of longer delays.Departure status and time of day also contribute to delay frequencies, with late departures and night-time operations showing varying degrees of influence.All coefficients are significant at the 0.1% level except for those marked with 'ns' (not significant).

Train Running Time Delay Decrease
We utilised both multiple logistic regression and negative binomial regression models to explore the opportunity for train delay reduction whilst traversing segments with scheduled trackwork.Detailed summaries of these models are presented in Appendix A, specifically in Tables A3 and A4.The statistical significance of each coefficient was determined using the Wald chi-square test, providing a robust basis for our analysis.
The outcomes of the multiple logistic regression model are summarised in Table 6.All coefficients are significant at the 0.1% level.The model indicates that trackwork is associated with a slight decrease in the likelihood of delay reduction (OR = 0.89).There is a notable increase in the probability of delay reduction for the quadruple track type (OR = 1.24).Among all train types, commuter trains exhibit an increased probability of delay reduction (OR = 1.30).If the train departs late, it is more likely to reduce delays (OR = 1.96).The time of day shows a minimal impact, with night-time operations slightly less likely to reduce delays (OR = 0.95).
Table 7 presents the negative binomial regression model outcomes, examining the count of train running time delay decreases exceeding or equal to 1 min.The model indicates a slight reduction in the frequency of delay reductions in the presence of trackwork (IRR = 0.96).For track types, quadruple tracks correlate with a lower frequency of delay reduction (IRR = 0.78), while single tracks demonstrate a marginal increase (IRR = 1.05).In terms of train types, commuter trains are more likely to reduce delays (IRR = 1.15), in contrast to airport trains, which show a notable decrease (IRR = 0.37).Departure status is a significant predictor, with late departures more frequently reducing delays (IRR = 1.26) and similar trends observed for on-time departures (IRR = 1.18).The time of day does not have a statistically significant impact.All coefficients are significant at the 0.1% level except for those marked with 'ns' (not significant).

Discussion
In this paper, we have analysed the association between trackwork and train delays, employing two distinct types of regression models: multiple logistic regression and negative binomial regression.These models provide a comprehensive understanding of the impact of trackwork on train delays.The logistic regression model sheds light on the probability of delay occurrences, while the negative binomial regression offers insights into the frequency of these delays.
Our study concludes that trackwork is linked to an increased rate of delay occurrences and a higher probability of delay increase.Trains passing through sections with scheduled trackwork are 1.43 times more likely to experience an increase in running time delay (≥1 min).Simultaneously, there is a 16% increase in the expected count of instances where train delays increase by at least one minute, compared to scenarios without trackwork.Conversely, the opportunity for train delay recovery diminishes in the presence of trackwork.The frequency of delay reduction decreases by 4%, and the likelihood of a delay decrease is 11% lower than when there is no trackwork.The sensitivity analysis regarding the size of the delay reveals a more pronounced effect for delays between 1 and 10 min, while the impact of trackwork on delays exceeding 10 min is insignificant.This indicates that trackwork primarily contributes to smaller, more frequent delays.
Although the negative impact of scheduled trackwork on train punctuality is relatively minor, primarily causing smaller delays (1-10 min), it still affects the reliability of railway operations.This effect might be mitigated by providing sufficient time for the trackwork to be completed and ensuring on-time performance.One strategy for achieving this is through the use of "maintenance windows", which involve reserving capacity for trackwork in advance of the completion of the train timetable.This allows train paths to adapt to capacity restrictions ahead of time and avoid any negative impact on performance.However, it has been observed that this approach is not yet utilised to its full potential, and train operators may have difficulty adapting to the restrictions.Additionally, there may be uncertainty [37] in the trackwork schedule even close to the execution period, which can lead to changes in the schedule and difficulties for train operators to adapt, resulting in train cancellations.
The trackwork scheduling approach used in this study is consistent with the SERA directive [32], which is widely adopted in European Union member states.Therefore, the findings of this study have broad relevance and demonstrate the need for increased attention to be given to trackwork scheduling.

Conclusions
In this paper, we have investigated the extent to which scheduled trackwork is associated with the probabilities and frequencies of train delays.Based on 32.5 million train passages and 225,000 instances of planned trackwork throughout the year 2017, the paper presents two regression models: multiple logistic regression and negative binomial regression.
The results show that trackwork significantly increases the likelihood of train delays, with trains 1.43 times more likely to experience delays of at least 1 min in these conditions and a 16% increase in instances of delay increases.However, trackwork also reduces the opportunities for delay recovery, leading to a 4% decrease in the frequency of delay reductions and an 11% lower likelihood of delay decrease.The analysis particularly highlights that trackwork predominantly affects shorter delays (1-10 min), with negligible impact on longer delays exceeding 10 min.
Only a small share of trains overlap with scheduled trackwork.However, the absolute number is likely to increase as both the number of trains and trackwork increase.While this issue was not a major contributor to delays in 2017, we expect that it will grow significantly with time.While the analysis indicates a relatively modest impact of trackwork on train delays, the anticipated increase in trackwork activities over the coming years could potentially magnify this issue.Therefore, exploring improved scheduling and performance strategies for trackwork may contribute to minimising conflicts between trackwork and train passages, albeit with the current effect being marginal.This study serves as a preliminary insight into the dynamics between trackwork and train operations, suggesting a measured approach towards optimising trackwork scheduling to accommodate the evolving demands of the railway network.All coefficients are significant at the 0.1% level except for those marked with 'ns' (not significant).

Figure 1 .
Figure 1.Railway track segment where trackwork happens between stations S1 and Sn.

Figure 2 .
Figure 2. Data processing workflow for train punctuality and trackwork analysis.

Table 1 .
Statistical summary of trackwork duration and train delays length.

Table 2 .
Characteristics of the analysed sample of train passages.

Table 3 .
Statistical summary of the analysed response variables for logistic and negative binomial regression models.
All coefficients are significant at the 0.1% level except for those marked with '*' (significant at the 1% level) and 'ns' (not significant).

Table 5 .
Negative binomial regression model summary.Response variables: count of train running time delay increase ≥1 min, train running time delay increase ≥5 min and train running time delay increase ≥10 min.

Table 7 .
Negative binomial regression model summary.Response variable: count of train running time delay decrease ≥1 min.

Table A2 .
Multiple logistic regression model summary.Response variable: train running time delay increase ≥5 min (0; 1) and train running time delay increase ≥10 min (0; 1).All coefficients are significant at the 0.1% level except for those marked with '*' (significant at the 1% level) and 'ns' (not significant).

Table A3 .
Negative binomial regression model summary.Response variable: train running time delay increase/decrease count ≥1 min.All coefficients are significant at the 0.1% level except for those marked with 'ns' (not significant).

Table A4 .
Negative binomial regression model summary.Response variable: train running time delay increase ≥5 min count and train running time delay increase ≥10 min count.