Selecting Network-Level Project Sections for Sustainable Pavement Management in Texas

: In recent years, the increasing gap between available funding and preservation needs has inﬂuenced district pavement engineers to select and prioritize projects to effectively use funding. However, currently, projects are often selected after an informal assessment, based on local conditions and local district engineers’ experience, in the absence of a statewide systematic process. The primary objective of this study is to determine network-level project sections for effective sustainable pavement management using logistic regression analysis. A large volume of inventory data, documented using pavement-management information systems (PMIS), was used to develop the logistic regression (LR) model for selecting candidate sections. The LR model was subsequently validated using a single 50/50 split sample method. The ﬁndings of this study will assist the Austin, Texas, USA district to select and evaluate candidate projects. Furthermore, the study will eventually contribute to improved efﬁciency in project selection and prioritization by reducing not only the amount of time necessary to review the district PMIS data to identify project candidates, but also the potential for human error.


Introduction
The Texas Department of Transportation (TxDOT) is concerned about effectively allocating their limited resources to current pavement-preservation efforts. In practice, repairing and maintaining pavements that are in good condition costs less than repairing them after they deteriorate [1]. However, the current funding level has not kept pace with Texas' pavement-preservation needs. In recent years, the TxDOT preventive-maintenance rehabilitation funding has continuously decreased, as shown in Figure 1 [2][3][4][5][6][7][8][9].
In 2011, the Texas 2030 committee warned that, although only 13 percent of the current road miles in Texas have been rated as fair, poor, or very poor, insufficient funding for pavement maintenance will significantly adversely affect future pavement quality. Eventually, nearly all of the pavements in Texas will reach poor or very poor conditions, based upon the current funding trend [10,11]. The increasing gap between available funding and preservation needs motivated the district pavement engineers to select and prioritize projects in order to use funding effectively [12]. To enable sustainable pavement management in Texas, district pavement engineers are requested to submit a list of pavement projects to the Four-Year Pavement-Management Plan Committee (PMPC). The 25 individual district plans are combined to create the statewide Four-Year Pavement-Management Plan, which is reviewed at all levels within TxDOT, including local or lower management levels, district or network management levels, and statewide or upper management levels. When district pavement engineers suggest projects for the Four-Year Pavement-Management Plan, they must determine and prioritize which projects should be funded first. Currently, each district in Texas uses locally developed methods to select and prioritize projects. Specifically, the Austin district previously assessed and prioritized project sections based on local factors and district experience, e.g., traffic level, pavement type, distress type, and maintenance costs. Based on their assessment, projects were created by combining adjacent project sections. Then, the projects were possibly modified based on funding source, project timing, and public or political issues, to support funding allocation. In particular, the network-level assessment generally required the district pavement engineers to spend almost two months manually evaluating the inventory data stored in the Pavement Management Information System (PMIS).
TxDOT does not have standard procedures for selecting and ranking candidate projects. As a result, projects are currently selected after an informal assessment based on local conditions and local district engineers' experience, without a statewide systematic process. A data-based project-selection model that applies common rules based on district engineers' knowledge is immediately needed to achieve a rational, transparent, and effective statewide Four-Year Pavement-Management Plan.
The primary objective of this study is to determine network-level project sections for effective, sustainable pavement management using logistic regression analysis. The study will (1) develop a logistic regression model using the data documented in PMIS, (2) determine the factors that significantly affect network-level project decisions, (3) select network-level project sections, and (4) prioritize those sections for projects that will later be listed in the Four-Year Pavement Management Plan. The findings of this study will establish a solid foundation for project prioritization based on a thorough analysis of pavement needs and project evaluations. The results will assist the Austin district in selecting and evaluating candidate projects. Furthermore, the study will ultimately contribute to improving project selection and prioritization efficiency by reducing not only the amount of time necessary to review the district PMIS data to identify project candidates, but also the potential for human error.

Pavement Management Information System (PMIS)
In 1993, the Texas Department of Transportation (TxDOT) developed the Pavement Management Information Systems (PMIS) to manage their pavement assets and to improve the overall conditions of Texas pavements [13,14]. This database is one of the largest pavement databases in the U.S., containing relevant pavement information for more than 300,000 road sections, each roughly 0.5 m in length [15].  When district pavement engineers suggest projects for the Four-Year Pavement-Management Plan, they must determine and prioritize which projects should be funded first. Currently, each district in Texas uses locally developed methods to select and prioritize projects. Specifically, the Austin district previously assessed and prioritized project sections based on local factors and district experience, e.g., traffic level, pavement type, distress type, and maintenance costs. Based on their assessment, projects were created by combining adjacent project sections. Then, the projects were possibly modified based on funding source, project timing, and public or political issues, to support funding allocation. In particular, the network-level assessment generally required the district pavement engineers to spend almost two months manually evaluating the inventory data stored in the Pavement Management Information System (PMIS).
TxDOT does not have standard procedures for selecting and ranking candidate projects. As a result, projects are currently selected after an informal assessment based on local conditions and local district engineers' experience, without a statewide systematic process. A data-based project-selection model that applies common rules based on district engineers' knowledge is immediately needed to achieve a rational, transparent, and effective statewide Four-Year Pavement-Management Plan.
The primary objective of this study is to determine network-level project sections for effective, sustainable pavement management using logistic regression analysis. The study will (1) develop a logistic regression model using the data documented in PMIS, (2) determine the factors that significantly affect network-level project decisions, (3) select network-level project sections, and (4) prioritize those sections for projects that will later be listed in the Four-Year Pavement Management Plan. The findings of this study will establish a solid foundation for project prioritization based on a thorough analysis of pavement needs and project evaluations. The results will assist the Austin district in selecting and evaluating candidate projects. Furthermore, the study will ultimately contribute to improving project selection and prioritization efficiency by reducing not only the amount of time necessary to review the district PMIS data to identify project candidates, but also the potential for human error.

Pavement Management Information System (PMIS)
In 1993, the Texas Department of Transportation (TxDOT) developed the Pavement Management Information Systems (PMIS) to manage their pavement assets and to improve the overall conditions of Texas pavements [13,14]. This database is one of the largest pavement databases in the U.S., containing relevant pavement information for more than 300,000 road sections, each roughly 0.5 m in length [15].
An annual PMIS data-collection survey is conducted at the beginning of each fiscal year from September to December to update the database with new pavement-condition and other inventory data. The pavement information stored in the system typically includes road type, location characteristics, and other indices, e.g., pavement-condition score, distress score, and ride score. The database is generally used by district pavement engineers to select pavement projects.

Logistic Regression Analysis
The logistic regression model, also referred to as a logit model, is commonly used to predict the presence or absence of an outcome with predictor variables [16]. Compared to other traditional regression techniques, logistic regression is mostly used for binominal models. The dependent variable is usually dichotomous, and the independent variables can take any form, e.g., categorical or numerical variables. Therefore, logistic regression does not need to be normally distributed like linear regression. The logit transformation [17,18] converts a probability measurement between 0 and 1 into values in the interval (−∞, ∞). The logit transformation is defined as where Logit (p) = the natural log of the odds, ln = the natural logarithm, and p = the probability of success.
After the dependent variable is transformed into a logit variable, it can be predicted by the independent variables using the maximum likelihood estimation. In a logistic regression model, the regression coefficients (β) can be interpreted as in linear models. Thus, β k represents the logit change of the probability associated with a unit change in the kth predictor, holding all other predictors constant. The regression equation is described below where logit (p) = the log odds of the dependent variable, b 0 = a constant, β k = a regression coefficient, and X k = k independent variables.

Model Development Process
The main goal of this study is to prioritize network-level project sections to support the Four-Year Pavement Management Plan. To effectively achieve this goal, logistic regression analysis was conducted using inventory data recorded from the PMIS and a list of projects for Austin district included in the statewide Four-Year Pavement Management Plan. Using the logit model, a large number of project sections that consistently matched with actual projects were selected and prioritized for sustainable pavement management. Figure 2 shows an overview of the model-development process.

Factors Affecting Pavement Treatment Decisions
To identify the relevant factors for a pavement-preservation decision, a number of pavement engineers from the Austin district of TxDOT participated in the research meetings. The participants were asked to evaluate an initial list of potential factors presented by the research team. Based on expert opinions, the research team identified five relevant factors documented in the PMIS database. These factors included (1) total average daily traffic (total ADT), (2) truck ADT, (3) posted speed limit, (4) condition score, and (5) change in condition score [19][20][21][22]. Each factor can be broken down into several categories that are relevant to the district pavement engineers. These factors are defined below: • Total average daily traffic (total ADT): volume of traffic in both lanes. • Truck average daily traffic (truck ADT): volume of truck traffic in both lanes. • Posted speed limit: legally assigned numerical maximum speed limit. • Condition score (CS): description of the overall pavement condition, combining the distress score and ride quality (1 = worst condition, 100 = best condition). • Change in condition score: change in condition score since last year (condition score in previous year-condition score in current year).

Figure 2.
Process for decision-tree-based model development.

Data Source and Preparation
This study used two main data sets obtained from the TxDOT Austin district, including (1) the PMIS database and (2) a list of Austin district's pavement preservation projects for TxDOT's fouryear plan. The PMIS database for Austin district includes 8423 road sections, each roughly 0.5 m in length, with relevant pavement information, e.g., a pavement condition summary and route characteristics. Austin district's preliminary list of pavement preservation projects included 409 pavement maintenance projects. This project data typically consisted of sections more than one-mile long, created by combining several related sections.
To link the relevant pavement information between the two different data sets, the selected projects were split into 0.5-m-long sections (network-level data), and 3800 sections were obtained. These sections were matched with those in the PMIS database, and the relevant pavement information was loaded from the PMIS database. However, missing or zero-value pavement information may often result in inaccurate inference. To avoid this, listwise deletion method was used and as a result, an entire record is excluded from analysis if any single pavement information is missing or zero-value. After removing missing or zero-value pavement information, 3076 sections were identified that had been selected for projects, and 4839 sections were identified that had not been selected for projects. Table 1 summarizes the statistics of the variables included in the analysis.

Logistic Regression Analysis
A total of 3958 sample project sections (training set: 50.0% of the total), which were randomly selected from the total samples (7915 sections), was used to build the logistic regression model for selecting pavement-preservation projects. The PMIS variables selected by expert input were set as the independent variables for the model, and the final decision on the Four-Year project selection was set as the dependent variable. For easy and consistent interpretation of the relative impact, the most severe categories were selected as references. These variables were dummy coded to evaluate their impact on project selection with the reference categories (marked * in Table 1).
Using the Statistical Package for Social Sciences (SPSS ® 19.0), a logistic regression model was developed with the combined data from the PMIS database and the preliminary list of projects for

Data Source and Preparation
This study used two main data sets obtained from the TxDOT Austin district, including (1) the PMIS database and (2) a list of Austin district's pavement preservation projects for TxDOT's four-year plan. The PMIS database for Austin district includes 8423 road sections, each roughly 0.5 m in length, with relevant pavement information, e.g., a pavement condition summary and route characteristics. Austin district's preliminary list of pavement preservation projects included 409 pavement maintenance projects. This project data typically consisted of sections more than one-mile long, created by combining several related sections.
To link the relevant pavement information between the two different data sets, the selected projects were split into 0.5-m-long sections (network-level data), and 3800 sections were obtained. These sections were matched with those in the PMIS database, and the relevant pavement information was loaded from the PMIS database. However, missing or zero-value pavement information may often result in inaccurate inference. To avoid this, listwise deletion method was used and as a result, an entire record is excluded from analysis if any single pavement information is missing or zero-value. After removing missing or zero-value pavement information, 3076 sections were identified that had been selected for projects, and 4839 sections were identified that had not been selected for projects. Table 1 summarizes the statistics of the variables included in the analysis.

Logistic Regression Analysis
A total of 3958 sample project sections (training set: 50.0% of the total), which were randomly selected from the total samples (7915 sections), was used to build the logistic regression model for selecting pavement-preservation projects. The PMIS variables selected by expert input were set as the independent variables for the model, and the final decision on the Four-Year project selection was set as the dependent variable. For easy and consistent interpretation of the relative impact, the most severe categories were selected as references. These variables were dummy coded to evaluate their impact on project selection with the reference categories (marked * in Table 1).
Using the Statistical Package for Social Sciences (SPSS ® 19.0), a logistic regression model was developed with the combined data from the PMIS database and the preliminary list of projects for TxDOT's four-year plan. In the model, the coefficients and standard errors of the parameters were determined using the maximum-likelihood method based on p-values less than 0.5.

Logistic Regression Analysis
The study adopted a cross-validation technique for validation, which was mainly used to assess the estimate of model-generalization errors. The most common sample splits are 50/50 or 2/3:1/3. In this study, a single 50/50 split sample validation was used to test the model. Therefore, the remaining 3957 sections (test sets: 50% of the total) were used to test the model after developing the LR model with the training set. This validation process ultimately supports the degree to which the logistic regression model can be generalized from the independent variables.

Results
The results of the logistic regression analysis showed that all the independent variables, except truck ADT, were significant predictors of project selection. Table 2 shows the logistic regression coefficient, Wald test, and odds ratio for each of the predictors. Using a 0.05 criterion of statistical significance, the total ADT, speed limit (less than 35 mph), condition score (above 50), and changes in CS (above +15) had significant effects. The independent variables that were not significant indicated coefficients that were not significantly different between selected and unselected projects. The following is the final model that was fit to the data: The logistic regression model showed a significant relationship between the dependent variable (project/non-project) and the independent variables. In the model, the dependent variable is presented Sustainability 2018, 10, 686 6 of 10 on the logit scale, which is the natural log of the odds. Accordingly, the variable estimates indicated the increase or decrease in the predicted log odds of a pavement maintenance project being selected with a one-unit increase in a predictor, holding all other predictors constant. In addition, the constant represented the expected odds of project selection when all the predictor variables were included in the (0) categories (all reference variables: above 50,000 total ADT, above 7500 truck ADT, above 55-mph speed limit, less than 50 condition score, and less than −30 change in CS).
In the total ADT, for example, the odds ratio of total ADT (1) is 0.419 (Exp (−0.870)). This means the odds of being selected for preservation projects, compared to those of not being selected for the projects, significantly decreased by a factor of 0.419 when the section decreased the total ADT from above 50,000 (total ADT (0): reference) to less than 1000 (ADT (1)). Consequently, the inverted odds ratios for these dummy variables indicate that the odds of selecting the section with total ADT (0) for preservation projects were 2.39 times (1/Exp[B]) more likely to be selected for a pavement-preservation project than not to be selected for the project.
Similarly, the odds for selecting a section with a speed limit above 55 mph were 3.7 times (1/0.267) higher than with a speed limit of less than 35 mph. Moreover, the odds for selecting a section with a condition score of less than 50 were 2.18 times (1/0.458) higher than that with a condition score between 70 and 90, and 7.75 times (1/0.129) higher than with a condition score between 90 and 100.
This can be more easily understood when the odds ratio is converted to the probability for project selection. Equation (4) computes the probability of being selected for a project from the log odds (logit). If the section has all reference categories, the model predicts an 89.3% chance of it being selected for a project. By controlling all other predictors as reference categories, the model predicts that 77.7% of the sections with an ADT between 10,000 and 50,000 will be selected for a pavement-maintenance project, while 68.9% of the sections with a speed limit of less than 35 mph will be selected for the project. Therefore, the probabilities of being selected for a project are improved when the sections have a greater total ADT, higher speed limit, lower condition score, and greater negative changes in the condition score. Based on the probability of being selected for a project, each section can be prioritized using the default cut-off value of 0.5. Table 3 shows examples of sections prioritized based on their probability for being selected as a project. The following is the equation for calculating probability of project selection: P(probability) = e b 0 +β 1 X 1 +β 2 X 2 + ...+β k X k 1 + e b 0 +β 1 X 1 +β 2 X 2 + ...+β k X k (4)

Logistic Regression Model Validation
A single 50/50 split sample validation was used to validate the LR model. After developing the model with 50% of the samples, the remaining 3957 samples were used to test the model. Table 4 shows the correct-classification rate of the LR model compared to the results of the validation. The LR model shows a higher correct-classification rate on selecting non-projects, while it has a relatively lower correct-classification rate for projects. On the other hand, the validation results showed somewhat similar rates for selecting projects or non-projects, with 72.6% and 64%, respectively. However, the research team found that the LR model consistently produced an overall correct-classification rate of about 70%, compared to the validation results.

Discussion
The primary purpose of this study was to develop a logistic regression model to select and prioritize project sections that were supported for the Four-Year Pavement Management Plan. This study was rigorously conducted with a huge number of samples. Although this study focused on the Austin district of Texas, other districts and states can employ similar methods to select and prioritize projects. As discussed with Austin district pavement engineers, the maintenance history was a critical factor for selecting projects. Although the LR model partially considered the pavement maintenance history by using the 'changes in condition score' factor-if maintenance occurs, the condition score increases significantly-the correct classification rate could be improved when the factor is directly included.
The findings of this study will be used to assist district pavement engineers in evaluating pavement sections for the statewide management plan. It will also improve the project selection and prioritization efficiency by providing the LR model, which potentially reduces human errors. Moreover, the study's findings should significantly reduce the time necessary to review the district PMIS data to identify candidate projects, which could potentially maximize the budget-allocation efficiency and improve the pavement conditions. In a nutshell, the study will eventually contribute to improved efficiency in project selection and prioritization by reducing not only the amount of time necessary to review the district PMIS data to identify project candidates, but also the potential for human error. In addition, the findings of the study suggest further studies on pavement-treatment selection, by extracting expert knowledge using the inventory data. Optimizing the funding allocation for pavement-project efficiency would expand these research findings as well.

Conclusions
This study attempted to predict network-level sections that would be selected for pavement-preservation projects using logistic regression analysis. A large number of samples were used to develop a logistic regression model. The model results indicated that all of the predictors, except the truck ADT, were significant at the 95% confidence level. These predictors included the total ADT, speed limit, condition score, and changes in condition score since last year. Based on the model, the probabilities of being selected for a project are improved when the sections have a greater total ADT, higher speed limit, lower condition score, and greater negative changes in the condition score.
Based on the probability of being selected for a project, each section can be prioritized using the default cut-off value of 0.5.
In addition, a single 50/50 split sample validation was used to validate the result. The validation results also confirmed the LR model by producing about a 70% correct-classification rate. The LR model shows a higher correct-classification rate on selecting non-projects, while it has a relatively lower correct-classification rate for projects. On the other hand, the validation results showed somewhat similar rates for selecting projects or non-projects, with 72.6% and 64%, respectively. However, the research team found that the LR model consistently produced an overall correct-classification rate of about 70%, compared to the validation results. Therefore, the findings of this study will assist the Austin, Texas, USA district to select and evaluate candidate projects.