Riding into Danger: Predictive Modeling for ATV-Related Injuries and Seasonal Patterns

: All-Terrain Vehicles (ATVs) are popular off-road vehicles in the United States, with a staggering 10.5 million households reported to own at least one ATV. Despite their popularity, ATVs pose a significant risk of severe injuries, leading to substantial healthcare expenses and raising public health concerns. As such, gaining insights into the patterns of ATV-related hospitalizations and accurately predicting these injuries is of paramount importance. This knowledge can guide the development of effective prevention strategies, ultimately mitigating ATV-related injuries and the associated healthcare costs. Therefore, we performed an in-depth analysis of ATV-related hospitalizations from 2010 to 2021. Furthermore, we developed and assessed the performance of three forecasting models—Neural Prophet, SARIMA, and LSTM—to predict ATV-related injuries. The performance of these models was evaluated using the Root Mean Square Error (RMSE) accuracy metric. As a result, the LSTM model outperformed the others and could be used to provide valuable insights that can aid in strategic planning and resource allocation within healthcare systems. In addition, our findings highlight the urgent need for prevention programs that are specifically targeted toward youth and timed for the summer season.


Introduction
All-Terrain Vehicles (ATVs) are popular off-road vehicles in the United States (U.S.), with an estimated 10.5 million households owning at least one ATV in 2017 [1].However, despite their popularity, ATVs are known to be unstable vehicles due to their high center of gravity, narrow wheelbase, and track width.Riding them can result in severe injuries and even death [2].According to the U.S. Consumer Product Safety Commission report [3], around 504 deaths occur every year due to ATV incidents.ATV-related injuries are a significant public health concern, with hospitalizations being a common outcome of ATV incidents [4].These hospitalizations not only result in significant morbidity and mortality, but also impose a significant economic burden on healthcare systems.The cost of ATVrelated hospitalizations is substantial, with estimates suggesting that the average cost of care for ATV-related injuries is approximately USD 90,000 per patient in the U.S. [5].As such, understanding the patterns of ATV-related hospitalizations and being able to accurately predict those injuries is crucial for the development of effective strategies for preventing ATV-related injuries and reducing healthcare costs.
ATV-related injuries follow a seasonal variation, in which most hospital admissions happen during the summer season, holidays, and weekends [6][7][8][9].In order to effectively predict and prevent ATV-related injuries, it is important to incorporate data seasonality into forecasting models.Some of the most popular forecasting models for time series data with seasonality are Facebook's Neural Prophet, Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and Long Short-Term Memory (LSTM) [10][11][12].Those models have been increasingly used for motor-vehicle-related injury prediction, in which recent studies have been increasingly used for motor-vehicle-related injury prediction, in which recent studies pointed out their effectiveness in predicting injuries and trends over a period of time [13][14][15][16]; yet, in the case of ATVs, little has been reported with the use of injury prediction models.An artificial neural network has been developed to predict the severity of ATV-related injuries.Its architecture was composed of nine hidden nodes and one hidden layer, resulting in a correct classification rate of 68.6% [17].Multiple linear regression models have been used to predict hospital length of stay and costs based on factors associated with ATV-related injuries.While length of stay was predicted by four different variables and presented an adjusted R 2 of 0.259, hospital charges were predicted by six different variables and presented an adjusted R 2 of 0.263 [18].However, to the best of our knowledge, there has been no study that focuses on ATV-related injury prediction that considers the seasonality of the data and provides an accurate estimate of the number of injuries throughout the year.
In this study, we aimed to address this gap by developing a forecasting model for ATV-related injuries that considers seasonal patterns.We conducted a thorough analysis of hospitalization records related to ATV use and identified key factors such as demographics and occupational usage to inform injury prevention strategies.Further, we implemented different forecasting models such as Neural Prophet, SARIMA, and LSTM, followed by performance evaluation.Our findings provide valuable insights into the patterns of ATV-related injuries, aid in the development of effective prevention strategies, and help to reduce the economic burden on healthcare systems and insurance companies.

Materials and Methods
Data on ATV-related injuries that led to hospitalizations were used to develop machine learning forecast models.The study aimed to develop forecasting models for ATVrelated injuries that accounted for the data's seasonality using different algorithms, such as Neural Prophet, SARIMA, and LSTM.The performance of these models was evaluated by using the Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) accuracy metrics.Figure 1 illustrates the study's framework.

Data Source and Treatment
This study utilized data from the National Electronic Injury Surveillance System (NEISS) online database for the period of January 2010 to December 2021.The NEISS database is maintained by the U.S. Consumer Product Safety Commission (CPSC) and provides information on injury events related to consumer products [19].Data on injury events were extracted from the NEISS database for all ATV-related incidents and were compounded by several descriptors such as age, race, gender, diagnosis, body part injured, disposition, location of the incident, and a brief descriptive narrative of the incident [20].Currently, recommendations for ATV use among youths follow different criteria, including engine size.It is recommended that engine sizes between 70 and 90 cc should be used by youths of 12 years old or older, and engine sizes larger than 90 cc should be used only by youths of 16 years old or older [21].For this study, we adopted this criterion for sorting the age range for the analysis.
Data obtained from the NEISS database were treated using an algorithm written in Python programming language.First, the data were cleaned and preprocessed to ensure that they were accurate and consistent.This included removing any duplicate data and ensuring that the data were in the correct format (e.g., date of the incident should be in date format).The data were then searched for references to ATV-related injuries, by using keywords such as "ATV", "All-Terrain Vehicle", "Four-Wheeler", "Quad bike", and variations derived from those names (e.g., "quadbike" and "4-wheeler").We used the field "emergency department (ED) disposition" as a way to measure the severity of injuries since the NEISS database did not have any specifications in this regard.We created the category "Hospitalization", which is a binary variable (yes/no).Data entries that presented ED disposition codes 2 (treated and transferred), 4 (treated and admitted/hospitalized), 5 (held for observation), and 8 (died in the emergency room) were assigned to Hospitalization = yes.Conversely, all data entries with ED disposition codes different from 2, 4, or 5 were assigned to Hospitalization = no.
The next step consisted of grouping the data by month instead of day (original dataset).The column "Monthly Hospitalizations" was created to account for the total reported cases per month.After all of these steps, the filtered and sorted data were saved in an Excel worksheet.Lastly, the obtained data were split into testing and training datasets.The testing dataset consisted of all ATV-related hospitalizations for the year 2021, accounting for a total of 633 cases.The training dataset included a total of 4688 cases of hospitalizations from the period of 2010 to 2020.

Neural Prophet Model
Neural Prophet is a combination of a neural network and the Prophet model, which is a decomposable time-series model, developed by Meta Platforms, Inc.The model is compounded by different modules, each of which adds a specific component to the forecast.Some of these components can also be adjusted to be influenced by the trend, resulting in a multiplied effect on the forecast.The model components can be described as follows [22]: where T(t) = Trend at time t; S(t) = Seasonal effect at time t; E(t) = Event and holiday effects at time t; F(t) = Regression effects at time t for future-known exogenous data; A(t) = Auto-regression effects at time t based on past observations; L(t) = Regression effects at time t for lagged observations of exogenous data.The Neural Prophet model uses a neural network to model non-linear relationships in the data and the Prophet to model seasonality.This combination allows the model to capture both the data's complex patterns and seasonality in their predictions [12,23].The model is effective at identifying and dealing with outliers, and it proved to be robust in handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study was conducted using Python 3.9 and the "neuralprophet" library.We used automatic selection for change points and added the influence of yearly seasonality and U.S. holidays in the model development.
Further, we set the number of hidden layers as four, the learning rate as 0.005, and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast time series data, with a particular focus on data that exhibit strong seasonal variation.The model is composed of autoregression (AR), difference (I), and moving average (MA) components, with an added seasonal component (S) to account for seasonality, as outlined by previous research [13,25].The model is summarized in Equation (2) [13].
handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study w Python 3.9 and the "neuralprophet" library.We used automatic se points and added the influence of yearly seasonality and U.S. holiday velopment.Further, we set the number of hidden layers as four, the lea and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast tim a particular focus on data that exhibit strong seasonal variation.The of autoregression (AR), difference (I), and moving average (MA) co added seasonal component (S) to account for seasonality, as outlined b [13,25].The model is summarized in Equation (2) [13].The implementation of the SARIMA model was conducted using "SARIMAX" package from the "statsmodels" library.Before training steps were taken to ensure that the data were modeled accurately.Th arity was analyzed through an Augmented Dickey-Fuller (ADF) test (α firmed that the data were non-stationary (p = 0.994).After submittin order differencing, the ADF test indicated the stationarity of the data (p as the data were showing an annual seasonality, we performed seaso lag 12 (yearly).Once we were able to identify the trend (d) and season order components (d, D = 1), and lag (equal to 12), we used the Auto a tion Functions to identify the other components (p, d and q).To check choice of the model's components was the best, we used a grid search of possible values for our estimated parameters.Then, we selected t lowest Akaike Information Criterion (AIC) score, which is a quality m ing models.The chosen model, which obtained an AIC equal to 689.60 (2, 1, 2) × (2, 1, 2)12.

LSTM Model
The LSTM model is a type of recurrent neural network that excel plex time series data with intricate patterns and seasonality.Its ability observations enables it to make accurate predictions.What sets this m ries of memory cells, which can capture intricate correlation features in and long time periods.This improvement over traditional recurrent n been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LSTM "tensorflow" library.Before fitting data to the model, the field "Monthl from the dataset was normalized on a scale from zero to one, as this mo to the scale of the input data [27].Finally, an optimal set of hyperparam d handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study Python 3.9 and the "neuralprophet" library.We used automatic points and added the influence of yearly seasonality and U.S. holid velopment.Further, we set the number of hidden layers as four, the and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast a particular focus on data that exhibit strong seasonal variation.Th of autoregression (AR), difference (I), and moving average (MA) added seasonal component (S) to account for seasonality, as outlined [13,25].The model is summarized in Equation (2) [13].The implementation of the SARIMA model was conducted usin "SARIMAX" package from the "statsmodels" library.Before traini steps were taken to ensure that the data were modeled accurately.arity was analyzed through an Augmented Dickey-Fuller (ADF) test firmed that the data were non-stationary (p = 0.994).After submitt order differencing, the ADF test indicated the stationarity of the data as the data were showing an annual seasonality, we performed sea lag 12 (yearly).Once we were able to identify the trend (d) and seas order components (d, D = 1), and lag (equal to 12), we used the Aut tion Functions to identify the other components (p, d and q).To che choice of the model's components was the best, we used a grid sear of possible values for our estimated parameters.Then, we selected lowest Akaike Information Criterion (AIC) score, which is a quality ing models.The chosen model, which obtained an AIC equal to 689 (2, 1, 2) × (2, 1, 2)12.

LSTM Model
The LSTM model is a type of recurrent neural network that exc plex time series data with intricate patterns and seasonality.Its abil observations enables it to make accurate predictions.What sets this ries of memory cells, which can capture intricate correlation features and long time periods.This improvement over traditional recurren been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LST "tensorflow" library.Before fitting data to the model, the field "Mont from the dataset was normalized on a scale from zero to one, as this m to the scale of the input data [27].Finally, an optimal set of hyperpa capture both the data's complex patterns and seasonality in their predictions [12,23].The model is effective at identifying and dealing with outliers, and it proved to be robust in handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study was conducted using Python 3.9 and the "neuralprophet" library.We used automatic selection for change points and added the influence of yearly seasonality and U.S. holidays in the model development.Further, we set the number of hidden layers as four, the learning rate as 0.005, and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast time series data, with a particular focus on data that exhibit strong seasonal variation.The model is composed of autoregression (AR), difference (I), and moving average (MA) components, with an added seasonal component (S) to account for seasonality, as outlined by previous research [13,25].The model is summarized in Equation ( 2) [13].The implementation of the SARIMA model was conducted using Python 3.9 and the "SARIMAX" package from the "statsmodels" library.Before training the model, several steps were taken to ensure that the data were modeled accurately.The dataset's stationarity was analyzed through an Augmented Dickey-Fuller (ADF) test (α = 0.05), which confirmed that the data were non-stationary (p = 0.994).After submitting the data to firstorder differencing, the ADF test indicated the stationarity of the data (p < 0.05).In addition, as the data were showing an annual seasonality, we performed seasonal differencing at lag 12 (yearly).Once we were able to identify the trend (d) and seasonal (D) differencing order components (d, D = 1), and lag (equal to 12), we used the Auto and Partial Correlation Functions to identify the other components (p, d and q).To check whether or not the choice of the model's components was the best, we used a grid search to explore a range of possible values for our estimated parameters.Then, we selected the model with the lowest Akaike Information Criterion (AIC) score, which is a quality measure for comparing models.The chosen model, which obtained an AIC equal to 689.60, was the SARIMA (2, 1, 2) × (2, 1, 2)12.

LSTM Model
The LSTM model is a type of recurrent neural network that excels at analyzing complex time series data with intricate patterns and seasonality.Its ability to learn from past observations enables it to make accurate predictions.What sets this model apart is its series of memory cells, which can capture intricate correlation features in the data over short and long time periods.This improvement over traditional recurrent neural networks has been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LSTM" package from the "tensorflow" library.Before fitting data to the model, the field "Monthly Hospitalizations" from the dataset was normalized on a scale from zero to one, as this model can be sensitive to the scale of the input data [27].Finally, an optimal set of hyperparameters was selected capture both the data's complex patterns and seasonality in their predictions [12,23].The model is effective at identifying and dealing with outliers, and it proved to be robust in handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study was conducted using Python 3.9 and the "neuralprophet" library.We used automatic selection for change points and added the influence of yearly seasonality and U.S. holidays in the model development.Further, we set the number of hidden layers as four, the learning rate as 0.005, and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast time series data, with a particular focus on data that exhibit strong seasonal variation.The model is composed of autoregression (AR), difference (I), and moving average (MA) components, with an added seasonal component (S) to account for seasonality, as outlined by previous research [13,25].The model is summarized in Equation (2) [13].The implementation of the SARIMA model was conducted using Python 3.9 and the "SARIMAX" package from the "statsmodels" library.Before training the model, several steps were taken to ensure that the data were modeled accurately.The dataset's stationarity was analyzed through an Augmented Dickey-Fuller (ADF) test (α = 0.05), which confirmed that the data were non-stationary (p = 0.994).After submitting the data to firstorder differencing, the ADF test indicated the stationarity of the data (p < 0.05).In addition, as the data were showing an annual seasonality, we performed seasonal differencing at lag 12 (yearly).Once we were able to identify the trend (d) and seasonal (D) differencing order components (d, D = 1), and lag (equal to 12), we used the Auto and Partial Correlation Functions to identify the other components (p, d and q).To check whether or not the choice of the model's components was the best, we used a grid search to explore a range of possible values for our estimated parameters.Then, we selected the model with the lowest Akaike Information Criterion (AIC) score, which is a quality measure for comparing models.The chosen model, which obtained an AIC equal to 689.60, was the SARIMA (2, 1, 2) × (2, 1, 2)12.

LSTM Model
The LSTM model is a type of recurrent neural network that excels at analyzing complex time series data with intricate patterns and seasonality.Its ability to learn from past observations enables it to make accurate predictions.What sets this model apart is its series of memory cells, which can capture intricate correlation features in the data over short and long time periods.This improvement over traditional recurrent neural networks has been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LSTM" package from the "tensorflow" library.Before fitting data to the model, the field "Monthly Hospitalizations" from the dataset was normalized on a scale from zero to one, as this model can be sensitive to the scale of the input data [27].Finally, an optimal set of hyperparameters was selected capture both the data's complex patterns and seasonality in their predictions [12,23].The model is effective at identifying and dealing with outliers, and it proved to be robust in handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study was conducted using Python 3.9 and the "neuralprophet" library.We used automatic selection for change points and added the influence of yearly seasonality and U.S. holidays in the model development.Further, we set the number of hidden layers as four, the learning rate as 0.005, and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast time series data, with a particular focus on data that exhibit strong seasonal variation.The model is composed of autoregression (AR), difference (I), and moving average (MA) components, with an added seasonal component (S) to account for seasonality, as outlined by previous research [13,25].The model is summarized in Equation (2) [13].The implementation of the SARIMA model was conducted using Python 3.9 and the "SARIMAX" package from the "statsmodels" library.Before training the model, several steps were taken to ensure that the data were modeled accurately.The dataset's stationarity was analyzed through an Augmented Dickey-Fuller (ADF) test (α = 0.05), which confirmed that the data were non-stationary (p = 0.994).After submitting the data to firstorder differencing, the ADF test indicated the stationarity of the data (p < 0.05).In addition, as the data were showing an annual seasonality, we performed seasonal differencing at lag 12 (yearly).Once we were able to identify the trend (d) and seasonal (D) differencing order components (d, D = 1), and lag (equal to 12), we used the Auto and Partial Correlation Functions to identify the other components (p, d and q).To check whether or not the choice of the model's components was the best, we used a grid search to explore a range of possible values for our estimated parameters.Then, we selected the model with the lowest Akaike Information Criterion (AIC) score, which is a quality measure for comparing models.The chosen model, which obtained an AIC equal to 689.60, was the SARIMA (2, 1, 2) × (2, 1, 2)12.

LSTM Model
The LSTM model is a type of recurrent neural network that excels at analyzing complex time series data with intricate patterns and seasonality.Its ability to learn from past observations enables it to make accurate predictions.What sets this model apart is its series of memory cells, which can capture intricate correlation features in the data over short and long time periods.This improvement over traditional recurrent neural networks has been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LSTM" package from the "tensorflow" library.Before fitting data to the model, the field "Monthly Hospitalizations" from the dataset was normalized on a scale from zero to one, as this model can be sensitive to the scale of the input data [27].Finally, an optimal set of hyperparameters was selected where capture both the data's complex patterns and seasonality in their predictions [12,23].The model is effective at identifying and dealing with outliers, and it proved to be robust in handling missing data and changes in the trend [24].
The implementation of the Neural Prophet model for this study was conducted using Python 3.9 and the "neuralprophet" library.We used automatic selection for change points and added the influence of yearly seasonality and U.S. holidays in the model development.Further, we set the number of hidden layers as four, the learning rate as 0.005, and the model's growth as linear.

SARIMA Model
SARIMA is a stochastic model designed to analyze and forecast time series data, with a particular focus on data that exhibit strong seasonal variation.The model is composed of autoregression (AR), difference (I), and moving average (MA) components, with an added seasonal component (S) to account for seasonality, as outlined by previous research [13,25].The model is summarized in Equation (2) [13].The implementation of the SARIMA model was conducted using Python 3.9 and the "SARIMAX" package from the "statsmodels" library.Before training the model, several steps were taken to ensure that the data were modeled accurately.The dataset's stationarity was analyzed through an Augmented Dickey-Fuller (ADF) test (α = 0.05), which confirmed that the data were non-stationary (p = 0.994).After submitting the data to firstorder differencing, the ADF test indicated the stationarity of the data (p < 0.05).In addition, as the data were showing an annual seasonality, we performed seasonal differencing at lag 12 (yearly).Once we were able to identify the trend (d) and seasonal (D) differencing order components (d, D = 1), and lag (equal to 12), we used the Auto and Partial Correlation Functions to identify the other components (p, d and q).To check whether or not the choice of the model's components was the best, we used a grid search to explore a range of possible values for our estimated parameters.Then, we selected the model with the lowest Akaike Information Criterion (AIC) score, which is a quality measure for comparing models.The chosen model, which obtained an AIC equal to 689.60, was the SARIMA (2, 1, 2) × (2, 1, 2)12.

LSTM Model
The LSTM model is a type of recurrent neural network that excels at analyzing complex time series data with intricate patterns and seasonality.Its ability to learn from past observations enables it to make accurate predictions.What sets this model apart is its series of memory cells, which can capture intricate correlation features in the data over short and long time periods.This improvement over traditional recurrent neural networks has been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LSTM" package from the "tensorflow" library.Before fitting data to the model, the field "Monthly Hospitalizations" from the dataset was normalized on a scale from zero to one, as this model can be sensitive to the scale of the input data [27].Finally, an optimal set of hyperparameters was selected The implementation of the SARIMA model was conducted using Python 3.9 and the "SARIMAX" package from the "statsmodels" library.Before training the model, several steps were taken to ensure that the data were modeled accurately.The dataset's stationarity was analyzed through an Augmented Dickey-Fuller (ADF) test (α = 0.05), which confirmed that the data were non-stationary (p = 0.994).After submitting the data to first-order differencing, the ADF test indicated the stationarity of the data (p < 0.05).In addition, as the data were showing an annual seasonality, we performed seasonal differencing at lag 12 (yearly).Once we were able to identify the trend (d) and seasonal (D) differencing order components (d, D = 1), and lag (equal to 12), we used the Auto and Partial Correlation Functions to identify the other components (p, d and q).To check whether or not the choice of the model's components was the best, we used a grid search to explore a range of possible values for our estimated parameters.Then, we selected the model with the lowest Akaike Information Criterion (AIC) score, which is a quality measure for comparing models.The chosen model, which obtained an AIC equal to 689.60, was the SARIMA (2, 1, 2) × (2, 1, 2) 12 .

LSTM Model
The LSTM model is a type of recurrent neural network that excels at analyzing complex time series data with intricate patterns and seasonality.Its ability to learn from past observations enables it to make accurate predictions.What sets this model apart is its series of memory cells, which can capture intricate correlation features in the data over short and long time periods.This improvement over traditional recurrent neural networks has been highlighted in a previous study [26].
The model was implemented by using Python 3.9 and the "LSTM" package from the "tensorflow" library.Before fitting data to the model, the field "Monthly Hospitalizations" from the dataset was normalized on a scale from zero to one, as this model can be sensitive to the scale of the input data [27].Finally, an optimal set of hyperparameters was selected based on the best RMSE value obtained by the model and the minimum input and output's loss functions.A summary of the parameters used in the model is shown in Table 1.
The model's training and validation performance were analyzed through the loss values obtained during both training and validation steps (Figure 2).To avoid overfitting, we implemented an early stopping function with patience equal to 50.The obtained model was trained with a total of 198 epochs and presented loss values equal to 0.006 for both training and validation (Figure 2).2).To avoid overfitting, we implemented an early stopping function with patience equal to 50.The obtained model was trained with a total of 198 epochs and presented loss values equal to 0.006 for both training and validation (Figure 2).

Accuracy Assessment of the Models
The forecasting models' accuracy was assessed through several accuracy metrics, such as MAE, MAPE, and RMSE.These metrics provide an indication of the difference between the predicted and actual values, allowing us to evaluate the overall performance of the models.MAE measures the average absolute difference between the predicted and actual values, while MAPE measures the average percentage difference.RMSE, on the other hand, measures the average of the squared differences between the predicted and actual values, providing the error in terms of the actual value's unit.Those metrics can be described as follows:

Accuracy Assessment of the Models
The forecasting models' accuracy was assessed through several accuracy metrics, such as MAE, MAPE, and RMSE.These metrics provide an indication of the difference between the predicted and actual values, allowing us to evaluate the overall performance of the models.MAE measures the average absolute difference between the predicted and actual values, while MAPE measures the average percentage difference.RMSE, on the other hand, measures the average of the squared differences between the predicted and actual values, providing the error in terms of the actual value's unit.Those metrics can be described as follows: where xi = Actual value; xi = Predicted value; n = Number of observations.

Demographic Characteristics and Overall Trends in Hospitalization
From January 2010 to December 2021, there were 5321 ED visits in the U.S. due to ATV-related incidents.Males accounted for the majority of hospitalizations, with a ratio of approximately 3:1 compared to females, as shown in Table 2.The age group with the highest hospitalization rate was youths between 12 and 15 years old, representing 16% of all hospitalizations.Among female patients, this trend was even more pronounced, with 19% of hospitalizations occurring in the 12-15 age group.In contrast, among male patients, the highest incidence of ED visits occurred in the 12 years or younger age group (15.79%), followed closely by the 30-39 age group (15.26%) and the 12-15 age group (15.21%).The high rate of ATV-related incidents among individuals younger than 16 is consistent with previous studies [28][29][30][31].Among the reported locations, recreational or sports areas were the leading cause of incidents, accounting for 35% of the reported locations and 17.6% of all the hospitalizations, including the incidents without a recorded location (Table 2).The high number of incidents in these locations may be attributed to several factors.In the U.S., ATVs are mainly used for recreational purposes such as off-road adventures and sports events, making these areas a prime location for incidents to occur.According to the U.S. Government Accountability Office, 79% of all ATV riders were using their vehicles for recreational purposes by 2008 [32].Furthermore, recreational and sports areas are often open spaces with rough terrain, which can increase the risk of incidents [33], especially for inexperienced riders.Riders engaging in ATV sports and recreation may take more risks and be less cautious, which can also increase the likelihood of incidents.Lastly, the lack of proper training, safety measures and enforcement of safety guidelines in these areas can contribute to the high number of injuries [34,35].
It was reported that 9.7% of all the hospitalizations were caused by riding ATVs on streets or highways (Table 2).This statistic highlights the risks of using ATVs, which are designed for off-road activities, on paved surfaces [8,29].These vehicles are equipped with low-pressure tires that are designed to offer traction on rough and slippery surfaces.However, when ridden on smooth surfaces, such as pavements, the tires have higher friction and adhesion with the road, which can cause the vehicle to shift laterally and increase the risk of rollover, especially when making turns at high speeds [29].
The number of hospitalization cases associated with riding ATVs on farms and ranches was among the lowest, accounting for 0.7% of all the reported cases (Table 2).Nevertheless, the agricultural setting is particularly dangerous for riders, as it may present several factors that contribute to the loss of the vehicle's control [36,37].As a result, the fatality rate for ATV-related incidents in the agriculture/forestry/fishing/hunting industry was reported to be 100 times greater compared to all other industries in the U.S. [38].Thus, it is noteworthy that data on farm and ranch incidents in this study may not be complete and fail to describe the full picture.According to the Occupational Safety and Health Act of 1970 [39], agricultural properties with ten or fewer employees are exempt from reporting working incidents, which could lead to data underrepresentation [40].
The overall trend in ATV-related hospitalizations pointed to an increase in the number of cases over the years, as illustrated in Figure 3.The analysis also revealed that the highest numbers of monthly hospital admissions for ATV-related injuries occurred during 2020 and 2021, with a peak of 91 hospitalizations in May 2020.This trend is consistent with reports from a previous study that pointed out an increase of 78% in injuries involving ATVs after the onset of shutdown measures due to the COVID-19 pandemic [41].
ute to the high number of injuries [34,35].
It was reported that 9.7% of all the hospitalizations were caused by riding ATVs on streets or highways (Table 2).This statistic highlights the risks of using ATVs, which are designed for off-road activities, on paved surfaces [8,29].These vehicles are equipped with low-pressure tires that are designed to offer traction on rough and slippery surfaces.However, when ridden on smooth surfaces, such as pavements, the tires have higher friction and adhesion with the road, which can cause the vehicle to shift laterally and increase the risk of rollover, especially when making turns at high speeds [29].
The number of hospitalization cases associated with riding ATVs on farms and ranches was among the lowest, accounting for 0.7% of all the reported cases (Table 2).Nevertheless, the agricultural setting is particularly dangerous for riders, as it may present several factors that contribute to the loss of the vehicle's control [36,37].As a result, the fatality rate for ATV-related incidents in the agriculture/forestry/fishing/hunting industry was reported to be 100 times greater compared to all other industries in the U.S. [38].Thus, it is noteworthy that data on farm and ranch incidents in this study may not be complete and fail to describe the full picture.According to the Occupational Safety and Health Act of 1970 [39], agricultural properties with ten or fewer employees are exempt from reporting working incidents, which could lead to data underrepresentation [40].
The overall trend in ATV-related hospitalizations pointed to an increase in the number of cases over the years, as illustrated in Figure 3.The analysis also revealed that the highest numbers of monthly hospital admissions for ATV-related injuries occurred during 2020 and 2021, with a peak of 91 hospitalizations in May 2020.This trend is consistent with reports from a previous study that pointed out an increase of 78% in injuries involving ATVs after the onset of shutdown measures due to the COVID-19 pandemic [41].Data seasonality analysis pointed out that the highest number of hospitalizations happened during the summer months (June, July, and August), accounting for 35% of all ATV-related injury cases (Figure 4).This finding aligns with both results from Table 2 and the previously reported seasonality patterns of ATV incidents [6][7][8][9], suggesting that ATV- Data seasonality analysis pointed out that the highest number of hospitalizations happened during the summer months (June, July, and August), accounting for 35% of all ATV-related injury cases (Figure 4).This finding aligns with both results from Table 2 and the previously reported seasonality patterns of ATV incidents [6][7][8][9], suggesting that ATV-related incidents are more likely to happen during warmer months, as most ATVs are used for recreational purposes (Table 2).
related incidents are more likely to happen during warmer months, as most ATVs are used for recreational purposes (Table 2).

Performance Assessment of the Models
It was observed that the LSTM model outperformed the other two models, with an RMSE of 3.71.The SARIMA and Neural Prophet models had RMSE values of 9.21 and 8.96, respectively (Table 3).Figure 5 shows the predicted values obtained by the models compared to the actual values for the validation dataset.

Performance Assessment of the Models
It was observed that the LSTM model outperformed the other two models, with an RMSE of 3.71.The SARIMA and Neural Prophet models had RMSE values of 9.21 and 8.96, respectively (Table 3).Figure 5 shows the predicted values obtained by the models compared to the actual values for the validation dataset.related incidents are more likely to happen during warmer months, as most ATVs are used for recreational purposes (Table 2).

Performance Assessment of the Models
It was observed that the LSTM model outperformed the other two models, with an RMSE of 3.71.The SARIMA and Neural Prophet models had RMSE values of 9.21 and 8.96, respectively (Table 3).Figure 5 shows the predicted values obtained by the models compared to the actual values for the validation dataset.The SARIMA model has been successfully used to predict road traffic incidents [13,42,43], as it is one of the most effective linear models for forecasting seasonal time series [13].In Forecasting 2024, 6 274 addition, this model can manage the data's secular trend, seasonal variation, and autocorrelation without the need for complex transformations or additional surrogate variables [42].However, it may not accurately predict non-linear data [13], such as the number of ATVrelated hospitalizations, which explains its relatively low accuracy (RMSE = 9.21).Moreover, additive models such as SARIMA and Prophet are prone to errors when dealing with unstable data [13].Data analysis in this study revealed a sudden increase in the numbers of monthly hospital admissions for ATV-related injuries from 2020 to 2021 (Figure 3), which also influenced the performance of the SARIMA and Neural Prophet models.
In contrast to the previous models, LSTM has strong capabilities in processing nonlinear and unstable data [13,44], which explains its superior performance in predicting ATV-related hospitalizations (RMSE = 3.71).Similar results have been reported in a previous study.Feng et al. [13] used the LSTM, SARIMA, and Prophet models to predict the number of road traffic incidents in Northeast China.Due to the non-linear nature of the data and the presence of disturbance within data, it was observed that the LSTM outperformed the other models.

Insights and Recommendations for Developing Effective Safety Guidelines and Prevention Programs
The findings of this study highlight the need for developing and implementing effective safety guidelines and prevention programs to reduce the number of ATV-related hospitalizations, which has a clear seasonal trend.We identified key characteristics of ATV-related hospitalizations, including demographic characteristics, locations of incidents, and seasonal trends, which could inform stakeholders (e.g., funding agencies, hospitals, and insurance companies).According to the data analysis, the majority of the ED visits occurred during the summer months, with a peak in July.Moreover, a high incidence of injuries among youth younger than 16 years old was reported, and areas of sports and recreation were among the leading locations in number of hospitalizations.These findings suggest that targeted prevention programs should be implemented during the summer season to reduce the incidence of ATV-related injuries.A hierarchy of controls was used as a guideline to identify possible solutions for the incident patterns reported in this study [45], as illustrated in Figure 6.
Forecasting 2024, 6, FOR PEER REVIEW 9 The SARIMA model has been successfully used to predict road traffic incidents [13,42,43], as it is one of the most effective linear models for forecasting seasonal time series [13].In addition, this model can manage the data's secular trend, seasonal variation, and autocorrelation without the need for complex transformations or additional surrogate variables [42].However, it may not accurately predict non-linear data [13], such as the number of ATV-related hospitalizations, which explains its relatively low accuracy (RMSE = 9.21).Moreover, additive models such as SARIMA and Prophet are prone to errors when dealing with unstable data [13].Data analysis in this study revealed a sudden increase in the numbers of monthly hospital admissions for ATV-related injuries from 2020 to 2021 (Figure 3), which also influenced the performance of the SARIMA and Neural Prophet models.
In contrast to the previous models, LSTM has strong capabilities in processing nonlinear and unstable data [13,44], which explains its superior performance in predicting ATV-related hospitalizations (RMSE = 3.71).Similar results have been reported in a previous study.Feng et al. [13] used the LSTM, SARIMA, and Prophet models to predict the number of road traffic incidents in Northeast China.Due to the non-linear nature of the data and the presence of disturbance within data, it was observed that the LSTM outperformed the other models.

Insights and Recommendations for Developing Effective Safety Guidelines and Prevention Programs
The findings of this study highlight the need for developing and implementing effective safety guidelines and prevention programs to reduce the number of ATV-related hospitalizations, which has a clear seasonal trend.We identified key characteristics of ATVrelated hospitalizations, including demographic characteristics, locations of incidents, and seasonal trends, which could inform stakeholders (e.g., funding agencies, hospitals, and insurance companies).According to the data analysis, the majority of the ED visits occurred during the summer months, with a peak in July.Moreover, a high incidence of injuries among youth younger than 16 years old was reported, and areas of sports and recreation were among the leading locations in number of hospitalizations.These findings suggest that targeted prevention programs should be implemented during the summer season to reduce the incidence of ATV-related injuries.A hierarchy of controls was used as a guideline to identify possible solutions for the incident patterns reported in this study [45], as illustrated in Figure 6.To address the concerning trend involving youth riders' incidents with ATVs, different approaches could be considered.Awareness programs in schools before summer breaks could be developed to provide educational materials and presentations to students To address the concerning trend involving youth riders' incidents with ATVs, different approaches could be considered.Awareness programs in schools before summer breaks could be developed to provide educational materials and presentations to students about ATV safety.This could include information on proper helmet use and safe riding practice.Additionally, schools could partner with local ATV dealerships to offer hands-on safety training courses for students.Schools could also provide resources and information to parents to ensure that they are aware of the risks associated with ATV use and the steps they can take to protect their children.Lastly, youth are particularly vulnerable to ATV-related incidents due to their physical, cognitive, and emotional immaturity [40,[46][47][48].To address this issue, implementing stricter laws and regulations around the use of ATVs by minors could help mitigate the risk of ATV-related incidents.Age restrictions, helmet laws, and limitations on when and where minors can operate ATVs are some of the measures that could be taken.
Further, efforts in promoting the use of vehicle safety structures could be considered to reduce ATV-related hospitalization cases.Several successful programs have been implemented to promote the use of Rollover Protective Structures (ROPS) on agricultural tractors in different countries.In Sweden, after a decrease in the number of unprotected tractors (without ROPS) from 24% to 8%, it was observed that the fatality rate (number of deaths per 100,000 tractors) had fallen to zero.A similar trend was observed in Australia after the implementation of the ROPS rebate program, where the number of unprotected tractors decreased from 24% to 7%.As a result, the program prevented approximately two deaths per year, within which a statistical estimate of 0.9 rollover deaths per year after the program was observed [49].In the United States, the success of the ROPS rebate program in the State of New York resulted in the creation of the ongoing National ROPS Rebate Program (NRRP).Since the implementation of the program, farmers who participated in the NRRP have proven its effectiveness.Approximately 99.5% of the participants would recommend the program to other farmers [50].Similar to ROPS, Crush Protection Devices (CPDs) were developed to protect ATV riders in the event of a rollover.In fact, 77% of the ATV-related rollover deaths could be avoided if the vehicles had a CPD [36].Despite that, no ongoing efforts to promote their use in the United States have been reported.
The LSTM model could be used to inform strategic planning and resource allocation for the public health system.Hospitals could leverage the model's predictive power to plan the amount of resources and health practitioners that will be necessary based on the model's predictions.For instance, the total estimated cost of ATV-related hospitalizations for the 2021 period based on the reported data is approximately USD 56,970,000.00 compared to USD 56,128,991.00 obtained from the LSTM model prediction [5].Moreover, the model can provide both short-and long-term predictions, and time constraints should not be an obstacle for stakeholders' strategic planning.

Strengths and Limitations
Strengths of this study include the use of a large sample size (n = 5321) from a national database to provide a comprehensive assessment of ATV-related hospitalizations in the U.S. over an 11-year period.We also conducted a thorough analysis of the demographic characteristics, locations of incidents, and seasonal trends to identify key characteristics of ATV-related hospitalizations that could inform future injury prevention efforts.Additionally, the study provides valuable insights to assist in future prevention programs and changes in safety guidelines.Lastly, it has been proven that the LSTM model is highly effective for predicting ATV-related injuries.Consequently, it could be used to develop more effective injury prevention strategies and reduce the economic expenses for healthcare systems by informing strategic planning.
On the other hand, there are a few limitations to this study.The dataset lacks more detail and additional variables that could enhance the study analysis.It did not have any specifications about injury severity other than ED disposition, which can influence the results.The absence of a detailed description about the occupational use of ATVs also limits the impact of the results.Moreover, we noticed that data on farm and ranch incidents may be underrepresented due to reporting exemptions, which could limit the generalizability of the findings to these settings.Despite these limitations, this paper provides important information for public health practitioners and policymakers to inform targeted interventions and reduce the burden of ATV-related injuries.

Conclusions
This study provides valuable insights into the characteristics and trends of ATVrelated hospitalizations in the United States.Our analysis identified a clear seasonal trend, with most incidents occurring during the summer months, with a peak in July.The high incidence of injuries among youth riders (<16 years old) and in sports and recreation areas highlights the need for targeted prevention programs to reduce the number of ATV-related injuries.The use of the LSTM model proved to be highly effective in predicting ATV-related injuries, and it outperformed the SARIMA and Neural Prophet models.In addition, the LSTM model could be utilized in developing more effective prevention strategies.The implementation of safety guidelines and prevention programs, such as awareness programs in schools, hands-on safety training courses, and information resources for parents, is a crucial step in reducing the burden of ATV-related injuries on individuals, families, and healthcare systems.Policymakers and public health practitioners should work together to implement effective prevention programs and allocate resources effectively, leveraging the insights provided by our study to make data-driven decisions.Ultimately, reducing ATV-related injuries requires a multi-faceted approach that involves education, awareness, policy, and advocacy efforts to promote safe and responsible ATV use.
of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.
of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.
where  = Difference operation; D = Seasonal degree; d = Degree of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.
where  = Difference operation; D = Seasonal degree; d = Degree of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.
where  = Difference operation; D = Seasonal degree; d = Degree of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.
where  = Difference operation; D = Seasonal degree; d = Degree of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.

=
Difference operation; D = Seasonal degree; d = Degree of mom-seasonal difference; xt = Time series; B = Backwards shift operator; εt = Error term at time t; s = Period length.

Figure 2 .
Figure 2. Training and validation losses for the LSTM model.

Figure 2 .
Figure 2. Training and validation losses for the LSTM model.

Figure 3 .
Figure 3. Number of ATV-related hospitalizations for the period from January 2010 to December 2021.

Figure 3 .
Figure 3. Number of ATV-related hospitalizations for the period from January 2010 to December 2021.

Figure 4 .
Figure 4. Total cases of ATV-related hospitalizations per month for the period from January 2010 to December 2021.

Figure 5 .
Figure 5.Comparison between actual and predicted number of hospitalizations obtained by the models.

Figure 4 .
Figure 4. Total cases of ATV-related hospitalizations per month for the period from January 2010 to December 2021.

Figure 4 .
Figure 4. Total cases of ATV-related hospitalizations per month for the period from January 2010 to December 2021.

Figure 5 .
Figure 5.Comparison between actual and predicted number of hospitalizations obtained by the models.

Figure 5 .
Figure 5.Comparison between actual and predicted number of hospitalizations obtained by the models.

Figure 6 .
Figure 6.Hierarchy of controls and the possible solutions based on the reported ATV incidents.Adapted from[45].

Figure 6 .
Figure 6.Hierarchy of controls and the possible solutions based on the reported ATV incidents.Adapted from[45].

Table 1 .
Summary of hyperparameters used for tuning the LSTM model.

Table 1 .
Summary of hyperparameters used for tuning the LSTM model.The model's training and validation performance were analyzed through the loss values obtained during both training and validation steps (Figure

Table 2 .
Demographic and location characteristics of ATV-related hospitalizations by gender in the U.S. from January 2010 to December 2021.

Table 3 .
Accuracy metrics of the prediction models.

Table 3 .
Accuracy metrics of the prediction models.

Table 3 .
Accuracy metrics of the prediction models.