Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact

Qureshi, Muhammad Faizan Rehman; Al-Kaisy, Ahmed

doi:10.3390/su17219896

Open AccessArticle

Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact

by

Muhammad Faizan Rehman Qureshi

and

Ahmed Al-Kaisy

^*

Department of Civil Engineering, Montana State University, Bozeman, MT 59715, USA

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(21), 9896; https://doi.org/10.3390/su17219896

Submission received: 16 July 2025 / Revised: 3 November 2025 / Accepted: 4 November 2025 / Published: 6 November 2025

(This article belongs to the Special Issue Advances in Data-Driven Transportation Systems: Emerging Trends, Challenges, and Applications)

Download

Browse Figures

Versions Notes

Abstract

State highway agencies usually measure Annual Average Daily Traffic (AADT) using traffic data from permanent detector stations within their system-wide traffic monitoring programs. Agencies also estimate the AADT at many other locations using short-term counts. Traffic counters at the permanent stations frequently malfunction, leading to periods of inaccurate or missing data. Addressing missing data in estimating AADT by highway agencies is important for sustainable infrastructure management. This study used extensive traffic data from permanent detector stations in the state of Montana to examine the effect of missing data on the accuracy of AADT estimation. On a rotational basis, one station was used to test the accuracy of AADT estimation, while the remaining stations (training stations) were used to develop the traffic adjustment factors. Data truncation at the training stations was conducted using two sampling techniques and three scenarios of data availability. The study results showed that the increase in AADT estimation error (inaccuracy) was not linearly proportional to the increase in the amount of missing data. Given the extreme scenarios of missing data examined in this study and the relatively lower effect on AADT estimation error, it can be concluded that the current practice in treating missing data does not involve a considerable compromise in the accuracy of AADT estimation. This highlights the robustness of the current estimation practice, suggesting that it can be effectively applied in statewide traffic monitoring programs without a significant loss of accuracy.

Keywords:

traffic monitoring program; AADT; missing data; adjustment factors; permanent stations

1. Introduction

State highway and transportation agencies implement systemwide traffic monitoring programs to collect traffic data, such as vehicle speed, weight, classification, and traffic volume. The Federal Highway Administration (FHWA) has provided guidance on the planning, development, and continued operation of traffic monitoring programs for state transportation agencies. As a result, each state records traffic data as part of the traffic monitoring program to meet the FHWA traffic data reporting requirement. This data serves as the backbone of many agency programs, including transportation planning, design, operation, and management.

Traffic volume, in the form of the annual average daily traffic (AADT), serves as the critical output of traffic monitoring programs. The AADT is calculated by dividing the total traffic in both directions of travel over one full year by 365. Accurate AADT estimation is crucial for sustainable transportation planning, ensuring reliable traffic assessments and informed infrastructure management. Transportation professionals at all levels know the importance of AADT data [1]. It is a fundamental metric in many applications, such as pavement and highway design, traffic operations, transportation planning, and fuel-tax revenue projections (to name but a few). Collecting comprehensive traffic data is often impossible due to associated costs [2]. Despite the challenge, it is crucial to have a reasonably accurate estimate of the AADT; therefore, agencies allocate substantial resources to various data-collection programs [3].

There are two primary types of traffic counts: continuous counts (often called control counts) at a limited number of permanent counting sites, and short-term counts (often called coverage counts) at a larger number of temporary counting sites throughout the network. Short-term counters collect data over a period ranging from 1 to 7 days (typically 48 h) [4]. These counters provide a sample of traffic volume across a larger portion of the highway network. Short-term counts must be seasonally adjusted to better reflect traffic conditions on a typical day. However, permanent traffic counters typically collect data hourly or at 15 min intervals, 7 days a week, 365 days a year. These provide a measure of variation in traffic patterns, accounting for daily, weekly, and monthly temporal variations.

Data collection techniques vary from manual devices to advanced detectors, sensors, and data recorders. Two fundamental types of data collection systems have been used at permanent sites in a continuously monitored highway network: Automatic Traffic Recorders (ATRs) and Weigh-in-Motion (WIM) systems. Due to the critical environment in which these systems operate, they are highly prone to failure, resulting in periods of inaccurate or missing data. Inaccurate or missing data often undermines the quality and usefulness of collected traffic data. Therefore, missing or unusable data poses a significant challenge to accurately assessing traffic volumes and adjusting traffic counts, underscoring the importance of reliable, well-maintained data collection systems.

The remainder of this paper is organized as follows. Section 2 provides a comprehensive review of relevant literature. Section 3 outlines the study motivation, objectives, and research hypothesis. Section 4 describes the overall methodological framework, followed by Section 5, which details the data sources and collection procedures. Research results and key findings are discussed in Section 6. Finally, Section 7 summarizes the major conclusions and discusses the implications and potential applications of research results.

2. Literature Review

Data collection and management methods have evolved significantly since the 1930s [5]. Summary statistics played a crucial role in the early stage of the AADT process development. In the 1930s, AADT data collection relied mainly on manual counting, which later evolved into mechanical measurement in the 1940s. The next few years saw the establishment of theoretical frameworks for AADT summary statistics and calculations. David Albright noted certain procedural uncertainties, which remained unchallenged until the late 1980s [5]. There was a lack of universal methods for calculating adjustment factors for collected traffic data [6]. Currently, many states depend on the FHWA Traffic Monitoring Guide (TMG) to obtain corrected data. The selection of sampling and AADT estimation techniques is left to the judgment of the transportation agencies.

A factoring approach is extensively used in the United States, as recommended by the AASHTO Guidelines for Traffic Data Programs [7] and the FHWA’s Traffic Monitoring Guide [8]. This approach uses permanent stations to develop group adjustment factors, which are then applied to the short-term counts to estimate AADT. Traditionally, monthly and daily adjustment factors are calculated and multiplied by short-term counts to estimate AADT.

Macioszek and Kurek (2021) [9] provided an in-depth analysis of traffic volume variability, emphasizing that traffic flow characteristics follow consistent and recurring patterns across different temporal and spatial dimensions. The authors noted that daily traffic distribution typically exhibits two pronounced peaks—morning and afternoon—on working days. In contrast, on public holidays, the morning peak diminishes and traffic activity shifts toward afternoon and evening hours. Weekly variations were also evident, with Mondays and Fridays showing transitional patterns influenced by weekend-related trips, while midweek days (Tuesday to Thursday) represented the most stable traffic behavior. Monthly and seasonal patterns reflected climatic and travel-related influences, with lower volumes observed during winter months and higher volumes during summer and holiday periods. The study further highlighted that traffic intensity varies by road type and location, with urban corridors and intersections near city centers experiencing greater fluctuations than peripheral or arterial routes. Moreover, the kind of day—working or non-working—significantly affects travel purpose and duration, with working days dominated by commuting trips and holidays characterized by social and recreational travel.

Due to malfunctions in traffic-counting devices, such as Automatic Traffic Recorders (ATRs) and Weigh-in-Motion (WIM) stations, data for specific periods may be missed or inaccurately captured. Zhong et al. (2005) [10] conducted an in-depth analysis of the counting efficiency of Alberta’s continuous traffic counting program. The results revealed that the datasets frequently contain missing data. For instance, in 1998, over 56% of Automatic Traffic Recorders (ATRs) had missing records, and nearly 35% contained less data than required by the AASHTO’s Guidelines for Traffic Data Programs for generating group expansion (adjustment) factors [7]. In most cases, deploying additional traffic-counting devices to complete the dataset is unrealistic, and discarding a substantial portion of the successfully collected data is undesirable. Therefore, many agencies resort to estimating missing values in collected traffic counts, a technique often called data imputation. Accurate data imputation is essential to maintaining data integrity and enhancing the cost-effectiveness of traffic data monitoring programs. The AASHTO’s Guidelines for Traffic Data Programs [7] identifies two fundamental principles. The principle of base integrity requires that the original traffic measurements remain unaltered, with missing values not imputed in the base data. The principle of truth-in-data requires that highway agencies document procedures for editing traffic data, ensuring transparency and reliability in decision-making for users.

Research conducted by the New Mexico State Highway and Transportation Department concluded that 13 states had procedures for estimating missing values and completing the dataset when portable traffic devices failed. In comparison, 23 states had similar procedures when permanent devices failed [11,12]. In Vermont, data imputation is performed using data from the same day and month in the previous year. The data from the last three years serve as the basis for estimation in South Dakota. In Montana, historical data from the same site are utilized for estimation. If no significant changes in traffic patterns have occurred, the data are directly used; otherwise, an adjustment factor is applied. In Delaware, data imputation is performed using linear interpolation from adjacent months. In Oklahoma, missing data estimates are derived from data collected on the same day of the week in the same month, with a maximum 9 h gap. In Indiana, estimations are based on data from the previous year with a maximum imputation duration of 1 week. In Alberta, missing hourly volumes are not imputed; instead, historical data are used to estimate monthly average daily traffic (MADT). In Manitoba, the estimation is for the same hour and day of the week as the previous year. In London, the estimation is based on the hourly volume of the same hours and days of the previous week [10].

A study comparing different imputation methods used by traffic agencies analyzed ATR data from Alberta and Saskatchewan [10]. The findings indicated that methods incorporating additional information and employing advanced prediction models yielded significantly better results. Another study focused on imputing traffic count data during the holiday season highlighted the superior performance of the k-nearest neighbor (KNN) method. Another research compared several imputation methods, such as expectation maximization (EM), mean, k-nearest neighbor (KNN), multivariate imputation by chained equations (MICE), random forest (RF), and median [13]. The results from the previous study showed that using the median value for imputation was the most effective approach. However, the evaluation of imputation accuracy was not conducted as an independent assessment. Instead, it was intertwined with the accuracy of predicting hourly volumes for a subsequent year. As a result, the effectiveness of the different imputation techniques might be constrained by the predictive capacity of the algorithms used, such as artificial neural networks (ANNs), long short-term memory (LSTMs), gated recurrent units (GRUs), and recurrent neural networks (RNNs). In a more recent study, researchers devised a congestion imputation model (CIM) for traffic congestion level data, utilizing joint matrix factorization [14]. The model captured data repetition and road similarities using time-based and location-based information. Subsequently, restrictions were applied to maintain consistency over time. The findings revealed that by leveraging attributes of congestion patterns, the model could impute missing data with greater accuracy than several other widely used methods.

In Australia, a recent study on imputing missing data utilized ATR data from New South Wales [15]. A random selection of 25% of the reliable data was used to test three different methods (multivariate imputation by chained equations (MICE), random forest (RF), and extreme gradient boosting (XGBoost)), each with 13 different imputing methods. The study examined two scenarios of data omission: 25% and 100%. The results showed that the missForest method outperformed the other imputation techniques. The AADT values were computed using both the original counts before imputation and the completed counts after imputation. The AADT values derived from the imputed data were marginally higher. However, when the ADT were plotted, the quality of the imputed data was validated, as the yearly trends exhibited a comparatively better fit [15]. Another study utilized non-motorized count (bicycle and pedestrian) data from various cities in Oregon and explored multiple imputation methods [16]. The findings indicated that the random forest (RF) method yielded the best results; however, for minimal missing values, negative binomial regression proved more effective.

Most of the studies in the literature have focused on two aspects: AADT estimation and Methods of Imputation for missing data. There is a lack of literature on the effect of missing data on the accuracy of AADT estimation.

3. Study Motivation

The accuracy of AADT estimation is expected to vary depending on the missing-data criteria used by highway agencies. This research aims to examine the impact of tolerating different levels of missing data on the accuracy of the AADT estimation. In the current practice, many agencies adopt the following criteria in estimating the AADT [15]:

At least one daily volume is necessary for each day of the week (DOW) within a month.
A minimum of 19 hourly observations must be recorded for each daily volume.
The daily traffic volume should fall within 20% of the average for that specific day of the week in the month.

Based on these criteria, only permanent stations with at least one DOW of data in any specific month are used in developing the group adjustment factors. Given the significant tolerance for missing data in the above criteria, this study aims to assess the impact of missing data on AADT estimation accuracy. Specifically, the research uses a case study to test the hypothesis that the greater the missing data in the permanent ATR/WIM stations, the lower the accuracy of the AADT estimation process.

4. Study Approach

To investigate the impact of various levels of missing data on the AADT estimation, a group of ATRs/WIMs belonging to the same highway functional classification with complete year of traffic counts (365 days) is required. Using data from the state of Montana, the “Rural Principal Arterials—Others” were selected for this investigation. These involve non-interstate major rural highways, including intercity routes throughout the state. The primary consideration for the selection is the higher number of ATR/WIM stations with full-year traffic counts (compared to other functional classifications). Of the stations identified with full-year traffic data, one station was used for validation and testing (later called the testing station), while the rest were used to develop the traffic (or the group) adjustment factors (later called the training stations). The process was repeated multiple times, ensuring that each station was used as a testing station once, while included in the training stations for all other iterations. The data at the training stations is then truncated to reflect the different levels of missing data, and the AADT for the testing station is estimated and compared with the actual value. Three levels of available (or missing) data were used: one week, two weeks, and three weeks of available data in any given month. To remove any potential bias in truncating the data, two random sampling techniques were used consistently with each level of missing data:

Sampling technique I: In this technique, random days within a specific month are selected to represent the level of available (or missing) data. For instance, one week of available data would include one Monday, one Tuesday, one Wednesday, etc., all selected randomly within the month.
Sampling technique II: In this technique, the duration of the available data (one, two, or three weeks) is selected randomly within the month as a continuous period of time. This sampling approach seems more realistic, as periods when ATRs or WIMs are down or malfunctioning tend to be continuous within any given month.

For each sampling technique, a large number of scenarios/simulations was considered following the law of larger numbers in statistics, which states that as the number of trials increases, the mean of the results becomes more accurate and converges to the expected value [17]. In this research, 100 simulations were used to generate the Monthly Day of the Week (MDOW) adjustment factors. Therefore, this process yielded a hundred sets of MDOW adjustment factors for each iteration.

The next step is the testing/validation of the AADT approximation using actual traffic counts at the testing station for each iteration. Using the 100 sets of MDOW adjustment factors, the percent approximation (% discrepancy) was calculated 100 times for each day of the year (daily volume treated as a short-term count), and absolute values were averaged to represent the % discrepancy for that particular day. The final step was to find the average discrepancy in AADT estimation for each level of missing data over one full year, using the mean absolute value of daily discrepancies from the previous step.

5. Data Collection

Data were collected using the Montana Department of Transportation (MDT) data collection program. Upon examining available ATR and WIM data from 2019 to 2023, the functional class of “rural principal arterial-Other” for the year 2022 was selected for use in this research. This functional class involves all non-interstate rural principal arterials, including major intercity routes. The year 2022 was considered for the higher number of ATR/WIM stations with a full year of continuous traffic data. Specifically, in 2022, there were 65 active permanent stations—25 ATRs and 40 WIMs—but only 17 had a full year of data. Among those 17 stations, 10 of them belonged to the “Rural Principal Arterials—Other” classification. As stated earlier, of these 10 stations, 9 were used for training and 1 for testing. Figure 1 shows the location of the 10 ATR and WIM stations used in this research (4 ATR sites labeled as “C” and 6 WIM sites labeled as “W”). The data collected was 365 days of traffic volume for each vehicle class (classes 1 to 13) at the ten permanent stations. Table 1 provides a general description of the 10 permanent stations considered for the study.

6. Analysis and Results

This section summarizes the results of the study that are presented in separate subsections. Specifically, a base scenario is presented in which no missing data are considered, and it is used as a benchmark for AADT approximation evaluation. Afterward, results from the scenarios with different levels of missing data are presented.

6.1. Base Condition (No Missing Data)

The base condition assumes that the MDOW adjustment factors are developed from permanent stations with a full year of traffic data. This scenario is important because it serves as a reference for assessing discrepancies in AADT estimation across different levels of missing data. Adjustment factors for each of the nine training stations were calculated using a factoring approach (outlined in the FHWA Traffic Monitoring Guide) and then averaged into a single set of 84 MDOW adjustment factors. For the testing station, the daily total vehicle count was considered a short-term count for the day, and using adjustment factors from the training stations, AADT was calculated by multiplying the short-term count for each day by the respective adjustment factor. The AADT was estimated for each of the 365 days in the testing dataset. Since the actual AADT was already known, the percentage discrepancy between the actual and estimated AADT was calculated for each day and then averaged over the entire year. For the base condition, the mean absolute percent discrepancy between the actual AADT and the estimated AADT at the testing stations is shown in Table 2.

6.2. Scenario 1 (Permanent Stations with One Week of Data per Month)

This scenario considered a total of 1 week of available traffic data per month, for a total of 84 days of traffic data annually—an extreme case showing the minimum data availability. As stated earlier, two random sampling techniques were applied in running the analysis, and 100 simulations of random selections were used in the analysis. Figure 2 shows the mean absolute percent discrepancy (for each iteration) using the two sampling techniques, along with the result for the base condition. This missing-data scenario (scenario 1) shows a notable increase in AADT estimation error (% discrepancy) compared to the base condition. The sampling technique I was found to have slightly lower % discrepancy compared with the sampling technique II in eight of the ten testing stations, and almost identical (no difference) in the remaining two stations.

To test whether the difference in means of both sampling techniques was statistically significant, a two-tailed t-test was performed, and the results are shown in Table 3. The t-test revealed no statistically significant difference between the means of sampling techniques I and II at any testing station, at the 95% confidence level. Therefore, it can be concluded that there is no evidence of a statistical difference between the mean percent discrepancy of the two sampling techniques. A one-tailed t-test was also conducted to compare the mean discrepancy of the base condition with the mean of scenario I, using sampling techniques I and II, as shown in Table 3. The tests revealed no statistically significant increase in the mean discrepancy for scenario I at the 95% confidence level using sampling techniques I or II at all testing stations except stations A-008 and W-101.

Figure 3 shows the daily discrepancies for station A-046 between the actual and estimated AADT for sampling technique II (scenario 1). Sampling technique II was selected because it is more realistic and more likely to occur than scenario I. A positive percent discrepancy indicates that the estimation method underestimates the AADT, while a negative percent discrepancy suggests that the estimation method overestimates the AADT. The figures show large fluctuations in the percent discrepancy when applying adjustment factors on different days throughout the year. A careful examination of Figure 3 shows that July and August, the peak travel season, were associated mainly with positive discrepancies, along with January and December. On the other hand, negative percent discrepancies were overrepresented during the off-peak fall season from late September to late November, as well as during March.

6.3. Scenario 2 (Permanent Stations with Two Weeks of Data per Month)

Using two weeks of available data in every month of the year, the AADT approximation was analyzed, and the results are presented in this section. Figure 4 shows the mean absolute percent discrepancy in AADT estimation for the base condition and the two-week data availability using the two sampling techniques.

At a glance, it is clear that the missing data negatively impacted the AADT estimation, as evidenced by the higher percent discrepancy. Further, similar to scenario 1 analysis, the sampling technique I (random days of the week) yielded a lower AADT approximation than sampling technique II (random period within the month) in eight of the ten testing stations.

T-tests were conducted to examine whether missing data and sampling techniques used have a significant effect on mean discrepancy in AADT estimation. Consistent with the t-test results of Scenario I, the two-tailed t-test revealed no statistically significant difference between the mean discrepancy using sampling techniques I and II at the 95% confidence level. Further, a one-tailed t-test was also conducted to compare the mean discrepancy of the base condition with the means of sampling technique I and sampling technique II, as shown in Table 4. The tests revealed no statistically significant increase in the mean discrepancy for sampling technique I or for sampling technique II at the 95% confidence level for any of the ten testing stations.

Figure 5 shows the daily percent discrepancies for station A-046 between the actual and estimated AADT for scenario 2 using the sampling technique II. The patterns exhibited in Figure 5 are largely similar to those shown in Figure 3. Specifically, the AADT is often underestimated during the peak summer season in July and August, as well as at the start and end of the year, while AADT is overestimated in the fall months of October and November, as well as in March.

6.4. Scenario 3 (Permanent Stations with Three Weeks of Data per Month)

This scenario had the least missing data among those considered in this study. Figure 6 shows the mean absolute percent discrepancy in AADT estimation for the base condition and three weeks of data availability using the two sampling techniques.

The patterns exhibited in Figure 6 are very similar to those displayed in Figure 2 and Figure 4. Specifically, the base condition is associated with the least discrepancy in AADT estimation, and the sampling technique I showed lower AADT estimation discrepancy than sampling technique II at eight of the ten testing stations.

Similar to the previous scenarios, the two-tailed t-test revealed no statistically significant difference between the mean discrepancy of sampling technique I and that of sampling technique II at the 95% confidence level, as shown in Table 5. Therefore, it can be concluded that there is no evidence of a statistical difference between the mean percent discrepancy of the two sampling techniques at the 95% confidence level at all testing stations. A one-tailed t-test was also conducted to compare the mean of the base condition with that of scenario 3 using sampling techniques I and II. The tests revealed no statistically significant increase in mean discrepancy for scenario 3 using sampling techniques I or II at the 95% confidence level.

Figure 7 shows the daily percent discrepancies for station A-046 between the actual and estimated AADT for scenario 3 using sampling technique II. Similar to the previous scenarios, AADT underestimation occurred more often in July and August, as well as in January and December, while AADT overestimation was more notable in the fall (October and November) and in March.

The analysis of the three scenarios consistently showed that missing data had an impact on the accuracy of AADT estimation. To examine the impact of the level of missing data on AADT estimation, the mean absolute percent discrepancy between the actual and estimated AADT at all testing stations was averaged for each scenario, and the results are shown in Figure 8 with the exact values provided in Table 6. Figure 8 clearly displays a pattern of decreasing mean percent discrepancy with increasing scenario number, from scenario 1 to scenario 3. This pattern is consistent with the expectation that the greater the amount of missing data, the lower the accuracy in AADT estimation and the greater the mean absolute percentage discrepancy between the actual and estimated AADT. This observation applies to the results of the two sampling techniques used in this study.

A careful examination of Table 6 reveals another important observation with implications for AADT estimation in practice. Specifically, despite the dramatic scenarios of missing data used in the study, the increase in the mean discrepancy of AADT estimation compared to the base condition (values shown in brackets) is modest in general, as it varied in the range of 0.68% (around 31% missing data) to 6.67% (around 77% missing data). This shows that the current practice of using at least one weekday per month adopted by some state agencies is unlikely to result in any significant compromise in AADT estimation. This is particularly true given the extreme scenarios tested in this study, i.e., the data are missing at all training stations simultaneously throughout the whole year, which is unlikely to occur in real life. The percent increase in AADT estimation error for the different levels of missing data and the two sampling techniques is shown in Figure 9.

A two-way ANOVA was conducted to examine the effects of missing data and sampling technique on the mean absolute percent discrepancy for each testing station. The results of the ANOVA for station A-046 are summarized in Table 7. The effects of missing data and sampling technique on the mean absolute percent discrepancy were not found statistically significant, with p-values of 0.272 and 0.586, respectively. Moreover, the interaction between missing data scenarios and sampling techniques was not statistically significant (p-value = 1). This indicates that the effect of sampling techniques on the mean absolute percent discrepancy did not depend on scenarios, and vice versa. Similar results were found for other testing stations.

It is also of interest to examine whether the level of missing data influences the variability of AADT approximation. Figure 10 shows the standard deviation of the mean absolute percent discrepancy for the various scenarios of missing data for station A-046 using the two sampling techniques. Results show that the highest level of missing data (scenario 1) is associated with higher standard deviation and variability in AADT estimation error. The other two levels of missing data (scenarios 2 and 3) are associated with lower variability in AADT estimation error, with the standard deviation of the mean absolute discrepancy nearly equal. Regarding the effect of the sampling technique on the variability in the discrepancy of AADT estimation, Figure 10 shows higher variability for sampling technique II with higher standard deviations for the various levels of missing data. Both sampling techniques exhibit a very similar pattern in the variability of AADT estimation error.

Figure 11 shows the mean absolute percent discrepancy for station A-046 and the coefficient of variation by month of the year, along with the ratio of the monthly average daily traffic (ADT) to the AADT. The figure clearly shows that the mean absolute percent discrepancy is higher during peak travel seasons (June–July) and during winter months with lower traffic volumes. The coefficient of variation in monthly mean percent discrepancy ranges from 0.6 to 0.9 throughout the year, except in September (highest value 1.18) and March (lowest value 0.14). The relationships shown in Figure 11 are almost identical for the three scenarios of missing data.

7. Summary and Conclusions

The annual average daily traffic (AADT) is a critical input to most transportation applications in the planning, design, and operations of highway facilities. State Departments of Transportation (DOTs) usually measure AADT at locations of permanent ATR and WIM stations and estimate the parameter at all other locations using short-term counts (1 to 2 days). To estimate AADT at short-term count locations, daily and seasonal variations in traffic at permanent stations are used in the form of adjustment factors. Traffic counters at the permanent ATR and WIM stations frequently malfunction due to wear and tear, physical damage, and/or harsh environmental conditions. This malfunction results in specific periods of inaccurate or missing data. Addressing missing data ensures AADT estimates remain robust for sustainable infrastructure management. In practice, the missing data at permanent stations is tolerated for the purpose of deriving the daily and seasonal adjustment factors as long as each month of the year has a minimum of one day of the week, i.e., one Monday, one Tuesday, etc. This practice raises questions about the impact of this rule on AADT estimation accuracy and whether AADT estimates remain sufficiently accurate for the intended applications. The current study aims to answer these questions. The study used ATR and WIM data from the state of Montana to examine the effect of missing data on the accuracy of AADT estimation. Two random sampling techniques were used, and three scenarios of data availability were considered in the investigation: one, two, and three weeks of available data within each month. The major findings of the study are summarized below.

Study results clearly show that the missing data has a consistent effect on the accuracy of AADT estimation, measured using the absolute percent discrepancy between the actual and estimated AADT. This finding supports the research hypothesis that the greater the amount of missing data, the less accurate the AADT estimation. However, this effect was not statistically significant using a two-way ANOVA at the 95% confidence level.
The increase in % discrepancy for AADT estimation was not linearly proportional to the increase in the amount of missing data. Despite the dramatic scenarios of missing data used in the analysis (31% to 77%), the change in the AADT estimation error between the highest and lowest levels of missing data was in the order of 6%.
Given the extreme scenarios of missing data used in the study (all permanent stations missing significant amounts of data simultaneously) and the relatively low effect on % discrepancy in AADT estimation (less than 7% discrepancy for the most extreme scenario), it is reasonable to conclude that the current practice in treating missing data does not involve an important compromise in the accuracy of AADT estimation. This increase remains within the ±10% error tolerance commonly referenced in FHWA’s Traffic Monitoring Guide for planning-level AADT estimation, though such discrepancies may be more critical in design or safety applications [8]. The finding also suggests that data containing at least one day of the week for each month can be used to develop daily and seasonal adjustment factors (i.e., MDOW factors) without the need to impute missing data. Based on the results, at least DOW per month appears to be the minimum acceptable threshold for reliable AADT estimation. This level of data coverage maintains accuracy within acceptable limits while capturing the necessary traffic pattern. Below this threshold, the reliability of AADT estimates may diminish, especially for design and safety-related applications that require higher precision.
Sampling technique I of selecting random days within the month was associated with a lower % discrepancy in AADT estimation compared with sampling technique II of selecting a random period within the month (one, two, or three weeks).

The evaluation in this study was conducted using data exclusively from the state of Montana. The highway system in Montana is mainly rural and subject to pronounced seasonal variation, influenced by both tourism activity (e.g., travel to Yellowstone and Glacier National Parks) and harsh winter weather. These characteristics may shape the adjustment factors and the magnitude of estimation errors observed. In contrast, other states may have different traffic patterns: states with highly urbanized networks (e.g., New York, California, Illinois) may exhibit more stable, less seasonal patterns. Further, states with milder climates (e.g., Texas) may experience year-round travel activity with limited winter-related mobility disruptions. Therefore, the authors recommend using data from other states to provide a comprehensive assessment, as traffic patterns vary substantially across regions.

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: A.A.-K.; data collection: M.F.R.Q.; analysis and interpretation of results: M.F.R.Q. and A.A.-K.; draft manuscript preparation: M.F.R.Q. and A.A.-K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank Peder Jerstad of the Montana Department of Transportation (MDT) for his help in traffic data collection for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AADT	Annual Average Daily Traffic
MDOW	Monthly Day of the Week
MDT	Montana Department of Transportation
FHWA	Federal Highway Administration
ATR	Automatic Traffic Recorder
WIM	Weigh-in-Motion

References

Sharma, S.C.; Lingras, P.; Liu, G.X.; Xu, F. Estimation of Annual Average Daily Traffic on Low-Volume Roads: Factor Approach Versus Neural Networks. Transp. Res. Rec. J. Transp. Res. Board 2000, 1719, 103–111. [Google Scholar] [CrossRef]
Zhao, F.; Chung, S. Contributing Factors of Annual Average Daily Traffic in a Florida County: Exploration with Geographic Information System and Regression Models. Transp. Res. Rec. J. Transp. Res. Board 2001, 1769, 113–122. [Google Scholar] [CrossRef]
Sun, X.; Das, S. Estimating Annual Average Daily Traffic for Low-Volume Roadways: A Case Study in Louisiana. In Proceedings of the 12th International Conference on Low-Volume Roads, Kalispell, MT, USA, 15–18 September 2019; Available online: https://trid.trb.org/View/1689977 (accessed on 15 February 2025).
Robichaud, K.; Gordon, M. Assessment of Data-Collection Techniques for Highway Agencies. Transp. Res. Rec. J. Transp. Res. Board 2003, 1855, 129–135. [Google Scholar] [CrossRef]
Albright, D. History of estimating and evaluating annual traffic volume statistics. Transp. Res. Rec. J. Transp. Res. Board 1991, 1305, 103–107. [Google Scholar]
Fekpe, E.; Gopalakrishna, D.; Middleton, D. Highway Performance Monitoring System Traffic Data for High-Volume Routes: Best Practices and Guidelines; Final Report; Office of Highway Policy Information Federal Highway Administration U.S. Department of Transportation: Washington, DC, USA, 2004. [Google Scholar]
Vandervalk-Ostrander, A.; American Association of State Highway and Transportation Officials; United States Federal Highway Administration; National Cooperative Highway Research Program. AASHTO Guidelines for Traffic Data Programs; PB-American Association of State Highway and Transportation Officials: Washington, DC, USA, 2009; Volume 1, Available online: http://dl1.wikitransport.ir/book/AASHTO_Guidelines_for_Traffic_Data_Programs_2009.pdf (accessed on 10 December 2024).
Federal Highway Administration (FHWA). Traffic Monitoring Guide. 2022. Available online: https://rosap.ntl.bts.gov/view/dot/74643 (accessed on 10 December 2024).
Macioszek, E.; Kurek, A. Road Traffic Distribution on Public Holidays and Workdays on Selected Road Transport Network Elements. Transp. Probl. 2021, 16, 127–138. [Google Scholar] [CrossRef]
Zhong, M.; Sharma, S.; Liu, Z. Assessing Robustness of Imputation Models Based on Data from Different Jurisdictions: Examples of Alberta and Saskatchewan, Canada. Transp. Res. Rec. J. Transp. Res. Board 2005, 1917, 116–126. [Google Scholar] [CrossRef]
Albright, D. 1990 Survey of Traffic Monitoring Practices Among State Transportation Agencies of the United States. 1990. Available online: https://api.semanticscholar.org/CorpusID:168276711 (accessed on 20 January 2025).
Albright, D. An imperative for, and current progress toward, national traffic monitoring standards. ITE J.-Inst. Transp. Eng. 1991, 61. Available online: https://api.semanticscholar.org/CorpusID:107206642 (accessed on 20 January 2025).
Khan, Z.; Khan, S.M.; Dey, K.; Chowdhury, M. Development and Evaluation of Recurrent Neural Network-Based Models for Hourly Traffic Volume and Annual Average Daily Traffic Prediction. Transp. Res. Rec. J. Transp. Res. Board 2019, 2673, 489–503. [Google Scholar] [CrossRef]
Jia, X.; Dong, X.; Chen, M.; Yu, X. Missing data imputation for traffic congestion data based on joint matrix factorization. Knowl.-Based Syst. 2021, 225, 107114. [Google Scholar] [CrossRef]
Shafique, M.A. Imputing Missing Data in Hourly Traffic Counts. Sensors 2022, 22, 9876. [Google Scholar] [CrossRef] [PubMed]
Roll, J. Daily Traffic Count Imputation for Bicycle and Pedestrian Traffic: Comparing Existing Methods with Machine Learning Approaches. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 1428–1440. [Google Scholar] [CrossRef]
Mendenhall, W. Introduction to Probability and Statistics [Internet], 14th ed.; Duxbury Press: Duxbury, MA, USA, 1986; Available online: https://books.google.com/books/about/Introduction_to_Probability_and_Statisti.html?id=fQsKAAAAQBAJ (accessed on 28 December 2024).

Figure 1. Study area showing selected ATR and WIM stations.

Figure 2. Average absolute percent discrepancy for AADT estimation using base condition and scenario 1.

Figure 3. Daily variation in percent discrepancies (for station A-046) using sampling technique II—Scenario 1. (blue: positive discrepancy, orange: negative discrepancy).

Figure 4. Average absolute percent discrepancies for AADT estimation using base condition and scenario 2.

Figure 5. Daily variation in percent discrepancies (for station A-046) using sampling technique II—scenario 2. (blue: positive discrepancy, orange: negative discrepancy).

Figure 6. Average absolute percent discrepancies for AADT estimation using base condition and scenario 3.

Figure 7. Daily variation in percent discrepancies (for station A-046) using sampling technique II—scenario 3. (blue: positive discrepancy, orange: negative discrepancy).

Figure 8. Mean absolute percent discrepancy (for ten testing stations) by missing data scenario and sampling technique.

Figure 9. Percent increase in AADT estimation error compared to the base condition.

Figure 10. Standard deviation of absolute percent discrepancy by level of missing data and sampling technique.

Figure 11. Relationship between mean and CV of monthly absolute percent discrepancies (for station A-046) for sampling technique II with monthly ADT/AADT for testing station.

Table 1. General description about the 10 permanent stations.

Station	Road	County	No. of Lanes	Lane Width (ft)
A-008	US 89	Cascade	4	12
A-046	US 2	Hill	2	12
W-101	US 287/US 12	Broadwater	2	12
W-110	MT Hwy 3	Broadwater	2	12
W-115	US 87	Chouteau	2	12
W-116	US 2	Hill	3	12
W-132	MT 200	Missoula	2	12
W-144	US 87	Yellowstone	2	12
W-147	US 2	Lincoln	2	12
W-149	MT 200	Lewis and Clark	2	12

Table 2. Average absolute percent discrepancy for AADT estimation using base condition.

Stations	A-008	A-046	W-101	W-110	W-115	W-116	W-132	W-144	W-147	W-149
Absolute Average Percent Discrepancy	7.71	10.43	9.79	9.37	12.26	11.30	16.31	13.82	10.90	21.11

Table 3. p-values from the t-test for testing the significant difference for scenario 1.

	One-Tailed t-Test		Two-Tailed t-Test
Stations	Base Condition vs. Sampling Technique I	Base Condition vs. Sampling Technique II	Sampling Technique I vs. Sampling Technique II
A-008	0.030	0.019	0.818
A-046	0.108	0.071	0.802
W-101	0.078	0.043	0.759
W-110	0.111	0.085	0.877
W-115	0.135	0.104	0.879
W-116	0.096	0.079	0.911
W-132	0.469	0.470	0.997
W-144	0.146	0.104	0.836
W-147	0.112	0.123	0.955
W-149	0.497	0.515	0.963

Underlined values refer to t-tests with significant difference at the 95% confidence level.

Table 4. p-values from the t-test for testing the significant difference for scenario 2.

	One-Tailed t-Test		Two-Tailed t-Test
Stations	Base Condition vs. Sampling Technique I	Base Condition vs. Sampling Technique II	Sampling Technique I vs. Sampling Technique II
A-008	0.317	0.213	0.737
A-046	0.374	0.263	0.752
W-101	0.321	0.231	0.783
W-110	0.341	0.331	0.978
W-115	0.365	0.319	0.891
W-116	0.334	0.305	0.934
W-132	0.505	0.516	0.978
W-144	0.361	0.311	0.889
W-147	0.358	0.395	0.923
W-149	0.507	0.527	0.959

Table 5. p-values from the t-test for testing the significant difference for scenario 3.

	One-Tailed t-Test		Two-Tailed t-Test
Stations	Base Condition vs. Sampling Technique I	Base Condition vs. Sampling Technique II	Sampling Technique I vs. Sampling Technique II
A-008	0.427	0.327	0.788
A-046	0.449	0.333	0.758
W-101	0.427	0.310	0.754
W-110	0.444	0.395	0.901
W-115	0.449	0.365	0.827
W-116	0.441	0.375	0.866
W-132	0.551	0.527	0.948
W-144	0.448	0.369	0.839
W-147	0.453	0.490	0.926
W-149	0.594	0.536	0.928

Table 6. Mean absolute percent discrepancies (aggregated across all 10 testing stations).

	Absolute Average % Discrepancies
	Scenario 1	Scenario 2	Scenario 3
% Missing Data	76.98	53.97	30.95
Sampling Technique I	13.017 [5.83] *	12.517 [1.76]	12.383 [0.68]
Sampling Technique II	13.121 [6.67]	12.603 [2.46]	12.496 [1.59]

* Value in brackets is the percent increase in AADT estimation error compared to the base condition.

Table 7. Results for the two-way ANOVA test.

	DF	Sum Sq	Mean Sq	F Value	Pr (>F)
Missing Data Scenario	2	181	90.47	1.302	0.272
Sampling Technique	1	21	20.64	0.297	0.586
Sampling Technique: Scenario	2	0	0.03	0	1
Residuals	2184	151,815	69.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qureshi, M.F.R.; Al-Kaisy, A. Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact. Sustainability 2025, 17, 9896. https://doi.org/10.3390/su17219896

AMA Style

Qureshi MFR, Al-Kaisy A. Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact. Sustainability. 2025; 17(21):9896. https://doi.org/10.3390/su17219896

Chicago/Turabian Style

Qureshi, Muhammad Faizan Rehman, and Ahmed Al-Kaisy. 2025. "Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact" Sustainability 17, no. 21: 9896. https://doi.org/10.3390/su17219896

APA Style

Qureshi, M. F. R., & Al-Kaisy, A. (2025). Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact. Sustainability, 17(21), 9896. https://doi.org/10.3390/su17219896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating AADT Using Statewide Traffic Data Programs: Missing Data Impact

Abstract

1. Introduction

2. Literature Review

3. Study Motivation

4. Study Approach

5. Data Collection

6. Analysis and Results

6.1. Base Condition (No Missing Data)

6.2. Scenario 1 (Permanent Stations with One Week of Data per Month)

6.3. Scenario 2 (Permanent Stations with Two Weeks of Data per Month)

6.4. Scenario 3 (Permanent Stations with Three Weeks of Data per Month)

7. Summary and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI