Machine Learning Modeling of Household Trip Generation by State Using NHTS Data
Abstract
1. Introduction
2. Literature Review
2.1. Conceptual Framework of Household Trip Generation
2.1.1. Demographic Characteristics
2.1.2. Economic Characteristics
2.1.3. Mobility and Locational Characteristics
2.1.4. Education, Work Patterns, and Health in Trip Generation
2.2. Evolution of Trip Generation Models
2.3. Addressing Spatial Heterogeneity in Trip Generation Research
3. Methodology
3.1. Data, Sample, and Variables
3.1.1. Data Source and Sample
3.1.2. Dependent and Independent Variables
- Demographic characteristics
- 2.
- Economic characteristics
- 3.
- Mobility and locational characteristics
- 4.
- Other social and individual characteristics
3.1.3. Handling Categorical Variables
3.2. Phase I: Comparative Benchmarking of Models
3.2.1. Linear Regression
3.2.2. Random Forest
3.2.3. Catboost
3.3. Phase II: Spatial Heterogeneity Analysis Using State Level Models
- Nationwide Model: A single linear regression model is trained on the full dataset (all states combined). This model identifies “global drivers” of trip generation and serves as a baseline for comparison.
- State-Level Models: Separate linear regression models are trained for each of the 51 geographic units, enabling the estimation of state-specific coefficients and p values for every predictor. This approach, central to the paper’s novelty, produces 51 distinct effect sizes per variable, revealing regional patterns.
3.4. Analytical Techniques and Visualization
4. Results
4.1. Benchmarking Results
4.2. National Linear Model Results
4.3. Spatial Heterogeneity Analysis
4.3.1. Coefficient Consistency Analysis: Identifying Core and Unstable Variables
4.3.2. Visualization of Geographic Patterns
5. Discussion
5.1. The Accuracy–Interpretability Paradox: Returning to Transparency
5.2. Dissecting Spatial Heterogeneity: A New Classification of Trip Generation Factors
5.3. Theoretical Implications: Toward a Place-Based Theory of Travel Behavior
6. Conclusions
- Prioritizing Interpretable Models: In public policy contexts, transparent models, including linear regression, allow us to understand underlying mechanisms can be more valuable than complex black box models that merely offer higher predictive accuracy.
- Identifying Factor Stability: Not all trip generation factors are created equal. Policymakers should distinguish between stable, fundamental factors (such as demographic structure) and volatile, context-dependent factors (such as income).
- The Need for Place-Based Planning: Transportation policies should be designed based on a precise understanding of each region’s unique characteristics. A pricing policy that proves effective in one state may be ineffective in another, and investment in public transportation can yield vastly different returns depending on the location.
- Longitudinal Analysis: Use panel data to examine how changes in policies or economic conditions over time affect trip generation coefficients across different states.
- Multilevel Modeling: Apply hierarchical models that simultaneously capture variation at the household, county, and state levels to more precisely disentangle the sources of spatial heterogeneity.
- Integration of Spatial Datasets: Incorporate more detailed variables related to land use, job density, public transit accessibility indices, and traffic patterns into the models to help explain a greater share of the observed variance in coefficients.
- Exploring Mode Choice Heterogeneity: Investigate whether similar spatial heterogeneity exists in the factors influencing the choice of travel mode (car, public transit, walking), as this would be a logical and important next step.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ortúzar, J.d.D.; Willumsen, L.G. Modelling Transport, 4th ed.; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar]
- McNally, M.G. The four-step model. In Handbook of Transport Modelling, 2nd ed.; Hensher, D.A., Button, K.J., Eds.; Pergamon: Oxford, UK, 2007. [Google Scholar]
- Ewing, R.; Cervero, R. Travel and the built environment: A meta-analysis. J. Am. Plan. Assoc. 2010, 76, 265–294. [Google Scholar] [CrossRef]
- Litman, T. Evaluating Transportation Land Use Impacts: Considering the Impacts, Benefits and Costs of Different Land Use Development Patterns; Victoria Transport Policy Institute: Victoria, BC, Canada, 2023. [Google Scholar]
- Rehill, P.; Biddle, N. Transparency challenges in policy evaluation with causal machine learning: Improving usability and accountability. Data Policy 2024, 6, e43. [Google Scholar] [CrossRef]
- Pucher, J.; Renne, J.L. Socioeconomics of urban travel: Evidence from the 2001 NHTS. Transp. Q. 2003, 57, 49–77. [Google Scholar]
- Giuliano, G.; Dargay, J. Car ownership, travel and land use: A comparison of the US and Great Britain. Transp. Res. Part A Policy Pract. 2006, 40, 106–124. [Google Scholar] [CrossRef]
- Mwale, M.; Luke, R.; Pisa, N. Factors that affect travel behaviour in developing cities: A methodological review. Transp. Res. Interdiscip. Perspect. 2022, 16, 100683. [Google Scholar] [CrossRef]
- Qawasmeh, B.; Qawasmeh, S.; Al Tawil, A.; Qawasmeh, D. Estimation of trip-based generation models and calibration of mode choice models for the American travel behavior. Open Transp. J. 2024, 18, e26671212348473. [Google Scholar] [CrossRef]
- Qawasmeh, B. Estimation of a household trip-based generation model for the state of Michigan. In Sustainable Approaches to Environmental Design, Materials Science, and Engineering Technologies; Springer: Cham, Switzerland, 2025; pp. 105–112. [Google Scholar]
- Fisu, A.A.; Syabri, I.; Andani, I.G.A. How do young people move around in urban spaces?: Exploring trip patterns of generation-Z in urban areas by examining travel histories on Google Maps Timeline. Travel Behav. Soc. 2024, 34, 100686. [Google Scholar] [CrossRef]
- Lee, S.; Golub, A. Difference in travel behavior between immigrants in the U.S. and U.S.-born residents: The immigrant effect for car-sharing, ride-sharing, and bike-sharing services. Transp. Res. Interdiscip. Perspect. 2021, 9, 100296. [Google Scholar] [CrossRef]
- Clifton, K.J.; Larco, N.; Currans, K.M.; Wettach-Glosser, J. Improving Trip Generation Methods for Livable Communities; Transportation Research and Education Center (TREC): Portland, OR, USA, 2017. [Google Scholar]
- Bhat, C.R.; Gossen, R. A mixed multinomial logit model analysis of weekend recreational episode type choice. Transp. Res. Part B Methodol. 2004, 38, 767–787. [Google Scholar] [CrossRef]
- Salon, D. Neighborhoods, cars, and commuting in New York City: A discrete choice approach. Transp. Res. Part A Policy Pract. 2009, 43, 180–196. [Google Scholar] [CrossRef]
- Blumenberg, E.; Pierce, G. Automobile ownership and travel by the poor: Evidence from the 2009 National Household Travel Survey. Transp. Res. Rec. 2012, 2320, 28–36. [Google Scholar] [CrossRef]
- Shaheen, S.; Cohen, A.; Zohdy, I. Shared Mobility: Current Practices and Guiding Principles; U.S. Department of Transportation, Federal Highway Administration: Washington, DC, USA, 2016. [Google Scholar]
- Clewlow, R.R.; Mishra, G.S. Disruptive Transportation: The Adoption, Utilization, and Impacts of Ride-Hailing in the United States; Institute of Transportation Studies, University of California: Davis, CA, USA, 2017. [Google Scholar]
- van Wee, B.; Witlox, F. COVID-19 and its long-term effects on activity participation and travel behaviour: A multiperspective view. J. Transp. Geogr. 2021, 95, 103144. [Google Scholar] [CrossRef]
- de Abreu e Silva, J.; Melo, P.C. Home telework, travel behavior, and land-use patterns: A path analysis of British single-worker households. J. Transp. Land Use 2018, 11, 1134. [Google Scholar] [CrossRef]
- Abdul Latiff, A.R.; Mohd, S. Transport, mobility and the wellbeing of older adults: An exploration of private chauffeuring and companionship services in Malaysia. Int. J. Environ. Res. Public Health 2023, 20, 2720. [Google Scholar] [CrossRef]
- Zhao, P.; Lü, B.; de Roo, G. Impact of the jobs-housing balance on urban commuting in Beijing in the transformation era. J. Transp. Geogr. 2011, 19, 59–69. [Google Scholar] [CrossRef]
- Sekhar, S.V.C.; Anand, S.; Karim, M.R. Comparison of regression model and category analysis (a case study). J. East. Asia Soc. Transp. Stud. 1997, 2, 917–929. [Google Scholar]
- Szczepanek, R. Daily streamflow forecasting in mountainous catchment using XGBoost, LightGBM and CatBoost. Hydrology 2022, 9, 226. [Google Scholar] [CrossRef]
- Aleksandrov, N.; Ermakov, D.; Aziz, A.; Kazenkov, O. Finding the optimal machine learning model for flood prediction on the Amur River. Comput. Nanotechnol. 2022, 9, 11–20. [Google Scholar] [CrossRef]
- Gao, Q.; Molloy, J.; Axhausen, K. Trip purpose imputation using GPS trajectories with machine learning. ISPRS Int. J. Geo-Inf. 2021, 10, 775. [Google Scholar] [CrossRef]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 9516. [Google Scholar] [CrossRef]
- Hancock, J.T.; Khoshgoftaar, T.M. CatBoost for big data: An interdisciplinary review. J. Big Data 2020, 7, 94. [Google Scholar] [CrossRef]
- Goel, R.; Mohan, D. Investigating the association between population density and travel patterns in Indian cities—An analysis of 2011 census data. Cities 2020, 100, 102656. [Google Scholar] [CrossRef]
- Chakraborty, A.; Mishra, S. Land use and transit ridership connections: Implications for state-level planning agencies. Land Use Policy 2013, 30, 458–469. [Google Scholar] [CrossRef]
- Hsieh, F.-S. Emerging research issues and directions on MaaS, sustainability and shared mobility in smart cities with multi-modal transport systems. Appl. Sci. 2025, 15, 5709. [Google Scholar] [CrossRef]
- Saleem, M.A.; Yasmin, F.; Ismail, H.; Low, D.; Afzal, H. Unlocking the maze: Exploring nested ecosystem of mobility as a service through systematic literature review. J. Adv. Transp. 2024, 2024, 4166852. [Google Scholar] [CrossRef]
- Chen, X.; Deng, H.; Guan, S.; Han, F.; Zhu, Z. Cooperation-oriented multi-modal shared mobility for sustainable transport: Developments and challenges. Sustainability 2024, 16, 11207. [Google Scholar] [CrossRef]
- Federal Highway Administration. 2017 NHTS Data User Guide; U.S. Department of Transportation: Washington, DC, USA, 2018. [Google Scholar]
- Federal Highway Administration. 2022 NextGen National Household Travel Survey Core Data; U.S. Department of Transportation: Washington, DC, USA, 2022. [Google Scholar]
- Kutner, M.H.; Nachtsheim, C.J.; Neter, J.; Li, W. Applied Linear Statistical Models, 5th ed.; McGraw-Hill Irwin: New York, NY, USA, 2005. [Google Scholar]
- Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis, 6th ed.; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar]
- Draper, N.R.; Smith, H. Applied Regression Analysis, 3rd ed.; Wiley: Hoboken, NJ, USA, 1998. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cutler, D.; Edwards, T.; Beard, K.; Cutler, A.; Hess, K.; Gibson, J.; Lawler, J. Random forests for classification in ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef]
- Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef]
- Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
- Jabeur, S.; Gharib, C.; Mefteh-Wali, S.; Ben Arfi, W. CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Change 2021, 166, 120658. [Google Scholar] [CrossRef]
- Bakhtiari, A.; Mirzahossein, H.; Kalantari, N.; Jin, X. Inferring socioeconomic characteristics from travel patterns. J. Reg. City Plan. 2023, 34, 122–136. [Google Scholar] [CrossRef]








| Variables | Description |
|---|---|
| HHFAMINC | Household Income |
| HHSIZE | Count of Household Members |
| HH_RACE | Race of Household Respondent |
| HOMEOWN | Home Ownership |
| LIF_CYC | Life Cycle Classification for the Household, Derived by Attributes Pertaining to Age, Relationship, and Work Status |
| HHVEHCNT | Count of Household Vehicles |
| Age_mean | Average Age of Household Persons |
| PTUSED | Count of Public Transit Usage |
| RIDESHARE | Count of Rideshare App Usage |
| URBRUR | Household in Urban/Rural Area |
| DRVRCNT_prc | Percentage of Drivers in the Household |
| HEALTH_Poor_prc | Percentage of People’s Opinion of Poor Health in the Household |
| Gender_male_prc | Percentage of Male Persons in Household |
| BORNINUS_No_prc | Percentage of Persons Not Born in the U.S. |
| EDUC_graduated_prc | Percentage of People Who Graduated in the Household |
| EDUC_some_college_prc | Percentage of Household Members with Some College Degree |
| EDUC_bachelor_prc | Percentage of Household Members with a Bachelor’s Degree |
| FLEXTIME_Yes_prc | Percentage of People with Flex Time in Household |
| GT1JB_Yes_prc | Percentage of People with More than One Job in Household |
| MEDCOND_Yes_prc | Percentage of People with a Positive Medical Condition in the Household |
| YOUNGCHILD_prc | Percentage of People with an Age Between 0 and 4 in the Household |
| OCCAT_Clerical_administration_prc | Percentage of People with Clerical Administration Jobs in the Household |
| OCCAT_Sales_service_prc | Percentage of People with Sales or Service Jobs in the Household |
| OCCAT_Manufacturing_construction_farming_prc | Percentage of People with Manufacturing or Construction, or Farming Jobs in the Household |
| OCCAT_Professional_managerial_technical_prc | Percentage of People with Professional, Managerial, or Technical Jobs in the Household |
| Category | Household Income |
|---|---|
| 1 | Less than $10,000 |
| 2 | $10,000 to $14,999 |
| 3 | $15,000 to $24,999 |
| 4 | $25,000 to $34,999 |
| 5 | $35,000 to $49,999 |
| 6 | $50,000 to $74,999 |
| 7 | $75,000 to $99,999 |
| 8 | $100,000 to $124,999 |
| 9 | $125,000 to $149,999 |
| 10 | $150,000 to $199,999 |
| 11 | $200,000 or more |
| Category | Race |
|---|---|
| 1 | White |
| 2 | Black or African American |
| 3 | Asian |
| 4 | American Indian or Alaska Native |
| 5 | Native Hawaiian or Other Pacific Islander |
| Category | Life Cycle Classification |
|---|---|
| 1 | one adult, no children |
| 2 | +2 adults, no children |
| 3 | one adult, youngest child 0 5 |
| 4 | +2 adults, youngest child 0 5 |
| 5 | one adult, youngest child 6 15 |
| 6 | +2 adults, youngest child 6 15 |
| 7 | one adult, youngest child 16 21 |
| 8 | +2 adults, youngest child 16 21 |
| 9 | one adult, retired, no children |
| 10 | +2 adults, retired, no children |
| Variable | Characteristic 1 | |
|---|---|---|
| Daily Household Trips (CNTTDHH) | 8.0 (5.6) [1.0, 95.0] | |
| Household Size (HHSIZE) | 2.2 (1.2) [1.0, 13.0] | |
| Mean Household Age (Age_mean) | 52.3 (18.3) [11.0, 92.0] | |
| Proportion of Young Children (<5) (YOUNGCHILD_prc) | 0.0 (0.1) [0.0, 0.8] | |
| Proportion of Drivers (DRVRCNT_prc) | 0.9 (0.2) [0.0, 1.0] | |
| Proportion of Males (Gender_male_prc) | 0.5 (0.3) [0.0, 1.0] | |
| Household Income Category (HHFAMINC) | 6.3 (2.5) [1.0, 11.0] | |
| Home Ownership (HOMEOWN) | Owned | 84,183 (79%) |
| Rented | 22,104 (21%) | |
| Number of Household Vehicles (HHVEHCNT) | 2.1 (1.1) [1.0, 12.0] | |
| Public Transit Trips (per month) (PTUSED) | 1.7 (6.5) [0.0, 132.0] | |
| Rideshare Trips (per month) (RIDESHARE) | 0.5 (2.7) [0.0, 211.0] | |
| Area Type (URBRUR) | Urban | 82,065 (77%) |
| Rural | 24,222 (23%) | |
| Proportion with Bachelor’s Degree (EDUC_bachelor_prc) | 0.2 (0.3) [0.0, 1.0] | |
| Proportion with Poor Health (HEALTH_Poor_prc) | 0.0 (0.1) [0.0, 1.0] | |
| Model | Weighted Avg. R2 | Weighted Avg. MAE | Number of States Won (by R2) |
|---|---|---|---|
| CatBoost | 0.323 | 3.373 | 14 |
| Linear Regression | 0.321 | 3.394 | 23 |
| Random Forest | 0.315 | 3.412 | 14 |
| Variable | States Modeled | States Significant | % Significant | Positive and Sig. | Negative and Sig. | Sign Flip? |
|---|---|---|---|---|---|---|
| HHSIZE | 51 | 44 | 86.3% | 44 | 0 | No |
| YOUNGCHILD_prc | 51 | 34 | 66.7% | 0 | 34 | No |
| LIF_CYC6 | 51 | 28 | 54.9% | 28 | 0 | No |
| LIF_CYC4 | 51 | 21 | 41.2% | 21 | 0 | No |
| LIF_CYC8 | 51 | 17 | 33.3% | 16 | 1 | Yes |
| LIF_CYC10 | 51 | 16 | 31.4% | 16 | 0 | No |
| EDUC_graduated_prc | 51 | 15 | 29.4% | 15 | 0 | No |
| FLEXTIME_Yes_prc | 51 | 15 | 29.4% | 15 | 0 | No |
| LIF_CYC2 | 51 | 15 | 29.4% | 15 | 0 | No |
| PTUSED | 51 | 13 | 25.5% | 9 | 4 | Yes |
| HHFAMINC | 51 | 10 | 19.6% | 10 | 0 | No |
| MEDCOND_Yes_prc | 51 | 10 | 19.6% | 0 | 10 | No |
| GT1JB_Yes_prc | 51 | 9 | 17.6% | 9 | 0 | No |
| EDUC_bachelor_prc | 51 | 8 | 15.7% | 8 | 0 | No |
| OCCAT_Professional_managerial_technical_prc | 51 | 8 | 15.7% | 0 | 8 | No |
| BORNINUS_No_prc | 51 | 6 | 11.8% | 0 | 6 | No |
| DRVRCNT_prc | 51 | 6 | 11.8% | 5 | 1 | Yes |
| RIDESHARE | 51 | 6 | 11.8% | 4 | 2 | Yes |
| Gender_male_prc | 51 | 5 | 9.8% | 0 | 5 | No |
| HHVEHCNT | 51 | 5 | 9.8% | 4 | 1 | Yes |
| HOMEOWN2 | 51 | 5 | 9.8% | 4 | 1 | Yes |
| Age_mean | 51 | 4 | 7.8% | 0 | 4 | No |
| LIF_CYC9 | 51 | 4 | 7.8% | 4 | 0 | No |
| OCCAT_Manufacturing_construction_farming_prc | 51 | 3 | 5.9% | 1 | 2 | Yes |
| EDUC_some_college_prc | 51 | 2 | 3.9% | 2 | 0 | No |
| HH_RACE1 | 51 | 2 | 3.9% | 0 | 2 | No |
| OCCAT_Sales_service_prc | 51 | 2 | 3.9% | 1 | 1 | Yes |
| OCCAT_Clerical_administration_prc | 51 | 1 | 2.0% | 0 | 1 | No |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Naseralavi, S.; Soltanirad, M.; Ranjbar, E.; Lucero, M.; Gorzin, F.; Hakiminejad, Y.; Azimi, S.; Baghersad, M.; Mazaheri, A. Machine Learning Modeling of Household Trip Generation by State Using NHTS Data. Urban Sci. 2025, 9, 353. https://doi.org/10.3390/urbansci9090353
Naseralavi S, Soltanirad M, Ranjbar E, Lucero M, Gorzin F, Hakiminejad Y, Azimi S, Baghersad M, Mazaheri A. Machine Learning Modeling of Household Trip Generation by State Using NHTS Data. Urban Science. 2025; 9(9):353. https://doi.org/10.3390/urbansci9090353
Chicago/Turabian StyleNaseralavi, Saber, Mohammad Soltanirad, Erfan Ranjbar, Martin Lucero, Fateme Gorzin, Yasaman Hakiminejad, Shiva Azimi, Mahdi Baghersad, and Akram Mazaheri. 2025. "Machine Learning Modeling of Household Trip Generation by State Using NHTS Data" Urban Science 9, no. 9: 353. https://doi.org/10.3390/urbansci9090353
APA StyleNaseralavi, S., Soltanirad, M., Ranjbar, E., Lucero, M., Gorzin, F., Hakiminejad, Y., Azimi, S., Baghersad, M., & Mazaheri, A. (2025). Machine Learning Modeling of Household Trip Generation by State Using NHTS Data. Urban Science, 9(9), 353. https://doi.org/10.3390/urbansci9090353

