Insights into Household Electric Vehicle Charging Behavior: Analysis and Predictive Modeling

: In the era of burgeoning electric vehicle (EV) popularity, understanding the patterns of EV users’ behavior is imperative. This paper examines the trends in household charging sessions’ timing, duration, and energy consumption by analyzing real-world residential charging data. By leveraging the information collected from each session, a novel framework is introduced for the efficient, real-time prediction of important charging characteristics. Utilizing historical data and user-specific features, machine learning models are trained to predict the connection duration, charging duration, charging demand, and time until the next session. These models enhance the understanding of EV users’ behavior and provide practical tools for optimizing the EV charging infrastructure and effectively managing the charging demand. As the transportation sector becomes increasingly electrified, this work aims to empower stakeholders with insights and reliable models, enabling them to anticipate the localized demand and contribute to the sustainable integration of electric vehicles into the grid


Introduction
The increasing adoption of electric vehicles (EVs) brings both opportunities and challenges to the electric grid [1].While EVs offer environmental benefits, their charging behavior can have a significant impact on grid stability and reliability.The rise in electricity demand for charging, especially in low-voltage systems, can lead to conflicts and affect the lifespan of transformers [2].Other studies have shown that the introduction of EVs can result in a considerable increase in transformers' Loss of Life (LoL).In [3], a 10× increase in LoL was observed when EVs were introduced, and the annual LoL in urban areas could increase from 0.002 to 0.014.Interestingly, the charging scenario, whether slow or fast charging, has a contrasting effect on the power equipment strain.Slow charging, typically performed at home during peak afternoon hours, puts more strain on the power equipment compared to fast charging during off-peak hours.Furthermore, EV usage can accelerate the aging of distribution transformers (DTs), as analyzed in [4] for an apartment complex with EV chargers.Realistic EV charging demand profiles were generated, and it was found that DT aging could be expedited by up to 40% with an EV penetration ratio of up to 30%.
The study also highlighted the potential benefits of integrating PV sources to enhance DTs' reliability.It is important to consider realistic charging profiles when assessing the impact of the EV load on the grid.The concerns regarding EV charging impacts were reiterated in [5], where it was shown that frequent charging throughout the day can have a significant negative affect on the performance of distribution transformers.The addition of more public fast chargers can lead to transformer overloading, even with fewer EVs.Uncoordinated EV charging patterns can significantly worsen the impact on the electric grid, necessitating costly upgrades and investments.This uncontrolled behavior strains the Challenges in the electric network due to EV charging are predicted in [14] using the data from charging points, tracking real EV users' charging and travel behavior over two years.Despite a focus on public charging stations, there is a need to understand residential charging patterns with the expected surge in EV adoption.
Predictive modeling of EV charging behavior has seen a surge in interest, with the employment of diverse approaches and methodologies [15][16][17][18][19][20][21][22].Commonly, supervised machine learning techniques, like decision trees, Random Forests (RFs), support vector machines (SVMs), and Artificial Neural Networks (ANNs), are utilized.For instance [23], employs XGBoost to predict EV departure times at public charging stations, achieving a mean absolute error (MAE) of 82 min.The authors of [24] predict the arrival and departure times at a university campus with mean absolute percentage errors (MAPEs) of 2.85% and 3.7%, respectively.Ensemble models, as seen in [18], use RFs, Naive Bayes (NB), and ANNs to predict household EV charging, achieving high true positive rates and accuracies.
Regression models, exemplified in [15], showcase the effectiveness of XGBoost in predicting the energy requirements using public charging station data, boasting high R² scores and low mean absolute errors.Some studies go beyond user behavior analysis to predict various charging outcomes, such as in [25], exploring the statistical characteristics of the charging duration, connection duration, and EV demand profiles.Anticipating EV charging demand is also explored in [26,27], predicting urban fast-charging demands and the popularity of charging infrastructure, respectively.
Deep learning models, including recurrent neural networks (RNNs), long short-term memory (LSTM), and gated recurrent units (GRUs), have gained traction for capturing complex patterns and dependencies in EV charging data.In [28], LSTM-based models outperform the traditional ANN models for charging load forecasting.The authors of [29] employ multiple RNN-based models for hourly charging load prediction, demonstrating the effectiveness of deep learning in capturing complex patterns.
Clustering techniques, such as k-means, hierarchical clustering, expectation-maximization algorithms, GMMs, and DBSCAN, are valuable for identifying distinct groups within charging data, revealing patterns and preferences.Hybrid estimators, like the combination of GKDE and DKDE proposed in [30], enhance the prediction accuracy for session duration and energy consumption.These clustering techniques empower researchers to develop effective strategies for managing charging infrastructure and optimizing resource allocation.
This study aims to take a comprehensive approach to the prediction of individual residential charging sessions, building off some of the approaches discussed above.A larger set of features are extracted and engineered from a four-year set of real data to capture as much information as possible for the modeling of future users' behavior.A unique regression framework is proposed, which leverages the historical information typically collected for each EV user, but does not rely on computationally expensive user-specific models, or the inclusion of user ID as a categorical feature.While several previous studies have shown that the inclusion of user IDs can lead to more accurate predictions, such models cannot be generalized to new users and can become computationally prohibitive as the number of included users grows.A diverse selection of supervised machine learning models are examined, comparing the results to linear and statistical approaches.This research aims to not only enhance the accuracy of EV charging predictions, but also to deepen the understanding of the interaction between various charging variables.

EV Users' Charging Behavior Analysis
This Section focuses on analyzing EV users' charging behavior at household charging stations using real collected data.The data span from January 2019 to December 2022, sourced from a total of 576 residential Level 2 charging stations across Omaha, NE, each equipped with a single-phase 40 A, 240 V charging port.A total of 265,340 charging sessions are recorded.For each charging session, comprehensive information is collected, including the unique EV users' IDs, start and end times for connection and charging, and kWh consumed.Table 1 provides a summary of the active users, number of sessions, and energy consumption in each year, highlighting the increased adoption of EVs in the Omaha area.Before further analysis, the dataset is cleaned to address the missing or potentially erroneous values.Sessions shorter than 5 min, or when less than 1 kWh was consumed are removed.Many of these sessions are indicative of connection problems and are not representative of intentional charging behavior.In addition, several sessions contained unrealistically high values for certain variables, such as charging demands exceeding the battery capacity of current EVs.To eliminate such outliers and examine trends in typical charging behavior, sessions containing values above the 95th percentile are excluded.The resulting dataset contains 228,988 valid charging sessions.The following Sections summarize the trends in these data across several variables of interest for EV charging behavior.

EV Connection Start Time
The connection start time is the moment a user connects their vehicle for charging, which is crucial for analyzing charging behavior and grid planning.Figure 1 illustrates the hourly start time distribution.There is a steady increase from 6 to 7 a.m., peaking at around 5 p.m., when many drivers return home.The number of sessions declines later in the evening, except for a spike at 10-11 p.m., possibly attributed to late shifts, or users choosing to charge before bed.
sourced from a total of 576 residential Level 2 charging stations across Omaha, NE, each equipped with a single-phase 40 A, 240 V charging port.A total of 265,340 charging sessions are recorded.For each charging session, comprehensive information is collected, including the unique EV users' IDs, start and end times for connection and charging, and kWh consumed.Table 1 provides a summary of the active users, number of sessions, and energy consumption in each year, highlighting the increased adoption of EVs in the Omaha area.Before further analysis, the dataset is cleaned to address the missing or potentially erroneous values.Sessions shorter than 5 min, or when less than 1 kWh was consumed are removed.Many of these sessions are indicative of connection problems and are not representative of intentional charging behavior.In addition, several sessions contained unrealistically high values for certain variables, such as charging demands exceeding the battery capacity of current EVs.To eliminate such outliers and examine trends in typical charging behavior, sessions containing values above the 95th percentile are excluded.The resulting dataset contains 228,988 valid charging sessions.The following Sections summarize the trends in these data across several variables of interest for EV charging behavior.

EV Connection Start Time
The connection start time is the moment a user connects their vehicle for charging, which is crucial for analyzing charging behavior and grid planning.Figure 1 illustrates the hourly start time distribution.There is a steady increase from 6 to 7 a.m., peaking at around 5 p.m., when many drivers return home.The number of sessions declines later in the evening, except for a spike at 10-11 p.m., possibly attributed to late shifts, or users choosing to charge before bed.

EV Connection End Time
The connection end time is the time when a user disconnects from the port, not the time when charging completes.This is an important attribute for the potential scheduling

EV Connection End Time
The connection end time is the time when a user disconnects from the port, not the time when charging completes.This is an important attribute for the potential scheduling of charging sessions, or vehicle-to-grid strategies.Figure 2 illustrates the hourly distribution of end times in the full dataset.Predominantly, the sessions conclude between 7 and 8 a.m., experiencing a swift decline throughout the day.This trend suggests that the users typically disconnect before departing for work or engaging in daily activities, but a significant percentage of users do disconnect at various times in the afternoon or evening.
of charging sessions, or vehicle-to-grid strategies.Figure 2 illustrates the hourly distribution of end times in the full dataset.Predominantly, the sessions conclude between 7 and 8 a.m., experiencing a swift decline throughout the day.This trend suggests that the users typically disconnect before departing for work or engaging in daily activities, but a significant percentage of users do disconnect at various times in the afternoon or evening.

Connection Duration
The connection duration is the time between connecting and disconnecting an electric vehicle from a charging station, indicating the total time that the vehicle is available for charging (or discharging).This can range from a few minutes to several hours.Figure 3 displays the distribution of connection durations in the dataset in one hour increments.Sessions longer than 24 h are grouped as a single bar and comprise approximately 2% of the sessions.The average connection duration is 8.8 h, but the distribution is highly irregular, with peaks in the 1-3 h range, as well as the 11-15 h range.A total of 50% of the sessions last 9 h or more.

Charging Duration
The charging duration is the time an EV draws power from the station each session.This may be the same as the connection duration if a user disconnects before the battery is full, or it may be much shorter.This duration is influenced by the charging power and battery capacity, but also by other user behavior, such as the miles driven between sessions.Figure 4 illustrates the distribution of charging durations for the dataset in one hour increments.The average charging duration is 2.1 h, but the distribution is heavily skewed.The majority of the charging durations are 1-2 h, with only 9% exceeding 5 h.These data

Connection Duration
The connection duration is the time between connecting and disconnecting an electric vehicle from a charging station, indicating the total time that the vehicle is available for charging (or discharging).This can range from a few minutes to several hours.Figure 3 displays the distribution of connection durations in the dataset in one hour increments.Sessions longer than 24 h are grouped as a single bar and comprise approximately 2% of the sessions.The average connection duration is 8.8 h, but the distribution is highly irregular, with peaks in the 1-3 h range, as well as the 11-15 h range.A total of 50% of the sessions last 9 h or more.
of charging sessions, or vehicle-to-grid strategies.Figure 2 illustrates the hourly distribution of end times in the full dataset.Predominantly, the sessions conclude between 7 and 8 a.m., experiencing a swift decline throughout the day.This trend suggests that the users typically disconnect before departing for work or engaging in daily activities, but a significant percentage of users do disconnect at various times in the afternoon or evening.

Connection Duration
The connection duration is the time between connecting and disconnecting an electric vehicle from a charging station, indicating the total time that the vehicle is available for charging (or discharging).This can range from a few minutes to several hours.Figure 3 displays the distribution of connection durations in the dataset in one hour increments.Sessions longer than 24 h are grouped as a single bar and comprise approximately 2% of the sessions.The average connection duration is 8.8 h, but the distribution is highly irregular, with peaks in the 1-3 h range, as well as the 11-15 h range.A total of 50% of the sessions last 9 h or more.

Charging Duration
The charging duration is the time an EV draws power from the station each session.This may be the same as the connection duration if a user disconnects before the battery is full, or it may be much shorter.This duration is influenced by the charging power and battery capacity, but also by other user behavior, such as the miles driven between sessions.Figure 4 illustrates the distribution of charging durations for the dataset in one hour increments.The average charging duration is 2.1 h, but the distribution is heavily skewed.The majority of the charging durations are 1-2 h, with only 9% exceeding 5 h.These data

Charging Duration
The charging duration is the time an EV draws power from the station each session.This may be the same as the connection duration if a user disconnects before the battery is full, or it may be much shorter.This duration is influenced by the charging power and battery capacity, but also by other user behavior, such as the miles driven between sessions.Figure 4 illustrates the distribution of charging durations for the dataset in one hour increments.The average charging duration is 2.1 h, but the distribution is heavily skewed.The majority of the charging durations are 1-2 h, with only 9% exceeding 5 h.These data aid in optimizing charging, managing infrastructure, and planning for factors like availability, time, and costs.

Idle Duration
The idle duration represents the time an EV stays connected to a charging station without actively charging.This duration, calculated by subtracting charging time from connection time, can occur for various reasons, like completed charging or interrupted sessions.Like the connection duration, this metric is extremely important for scheduling or vehicle-to-grid strategies, which make use of the hours that the cars are connected to the grid.Figure 5 illustrates the distribution of idle durations in the full dataset in one hour increments.The average idle duration is 8.7 h, but the distribution is again highly irregular, with approximately 10.4% of the sessions having a duration under one hour, and a smaller peak in the 9-11 h range.Less than 1% of the sessions have an idle duration longer than 24 h.

Charging Demand
The charging demand signifies the energy consumed by an EV during a charging session.This is largely proportional to charging duration, affected by the charging power, battery specifications, and users' driving behavior.These data are crucial for residential grid planning in aggregated areas and are a key constraint for scheduling charging sessions.Figure 6 illustrates a histogram of the charging demand for the sessions in this dataset in 1 kWh increments.The charging demand follows a skewed distribution similar to the charging duration.The average consumption is 12.9 kWh, but 50% of the sessions consume less than 10 kWh, and 90% of the sessions consume less than 33 kWh.

Idle Duration
The idle duration represents the time an EV stays connected to a charging station without actively charging.This duration, calculated by subtracting charging time from connection time, can occur for various reasons, like completed charging or interrupted sessions.Like the connection duration, this metric is extremely important for scheduling or vehicle-to-grid strategies, which make use of the hours that the cars are connected to the grid.Figure 5 illustrates the distribution of idle durations in the full dataset in one hour increments.The average idle duration is 8.7 h, but the distribution is again highly irregular, with approximately 10.4% of the sessions having a duration under one hour, and a smaller peak in the 9-11 h range.Less than 1% of the sessions have an idle duration longer than 24 h.

Idle Duration
The idle duration represents the time an EV stays connected to a charging station without actively charging.This duration, calculated by subtracting charging time from connection time, can occur for various reasons, like completed charging or interrupted sessions.Like the connection duration, this metric is extremely important for scheduling or vehicle-to-grid strategies, which make use of the hours that the cars are connected to the grid.Figure 5 illustrates the distribution of idle durations in the full dataset in one hour increments.The average idle duration is 8.7 h, but the distribution is again highly irregular, with approximately 10.4% of the sessions having a duration under one hour, and a smaller peak in the 9-11 h range.Less than 1% of the sessions have an idle duration longer than 24 h.

Charging Demand
The charging demand signifies the energy consumed by an EV during a charging session.This is largely proportional to charging duration, affected by the charging power, battery specifications, and users' driving behavior.These data are crucial for residential grid planning in aggregated areas and are a key constraint for scheduling charging sessions.Figure 6 illustrates a histogram of the charging demand for the sessions in this dataset in 1 kWh increments.The charging demand follows a skewed distribution similar to the charging duration.The average consumption is 12.9 kWh, but 50% of the sessions consume less than 10 kWh, and 90% of the sessions consume less than 33 kWh.

Charging Demand
The charging demand signifies the energy consumed by an EV during a charging session.This is largely proportional to charging duration, affected by the charging power, battery specifications, and users' driving behavior.These data are crucial for residential grid planning in aggregated areas and are a key constraint for scheduling charging sessions.Figure 6 illustrates a histogram of the charging demand for the sessions in this dataset in 1 kWh increments.The charging demand follows a skewed distribution similar to the charging duration.The average consumption is 12.9 kWh, but 50% of the sessions consume less than 10 kWh, and 90% of the sessions consume less than 33 kWh.

Time to Next Charge
The time to next charge is measured from the start of the current session to the start of the next session.This parameter is one way to anticipate when charging sessions will occur for individual users or groups.The accurate prediction of this is important for anticipating the grid demand, or the scheduling of charging sessions.Figure 7 illustrates a histogram for the time to next charge in this dataset.There is a very wide distribution, with a peak near 24 h, indicating the common daily periodicity of charging behavior.The average is slightly longer than a day, however, at 26 h, as it is somewhat common for users to not charge every day.Smaller sub-peaks exist at 24 h intervals, indicating that even if a user skips a day, they are more likely to charge at a similar time as their last session.The standard deviation of this parameter is 23 h, which is equal to its median, indicating a relatively large amount of variance in the data.

Summary of EV Charging Behavior
The variables related to EV charging sessions exhibit different distribution patterns.The charging demand and duration are highly related and follow similar skewed patterns, with a concentration around the low values and long tails that have a hard limit due to the battery capacity.The connection duration, idle duration, and time to next charge exhibit highly irregular distributions, with large standard deviations relative to their means, especially for the time to next charge.Table 2 provides the mean, median, and standard deviation of the variables that are predicted in the following Section.The large variance in charging behavior between individual sessions represents a potential challenge for the prediction of these variables.
The connection durations are significantly longer than charging durations on average, and this is reflected in the large idle durations.This discrepancy could be utilized to

Time to Next Charge
The time to next charge is measured from the start of the current session to the start of the next session.This parameter is one way to anticipate when charging sessions will occur for individual users or groups.The accurate prediction of this is important for anticipating the grid demand, or the scheduling of charging sessions.Figure 7 illustrates a histogram for the time to next charge in this dataset.There is a very wide distribution, with a peak near 24 h, indicating the common daily periodicity of charging behavior.The average is slightly longer than a day, however, at 26 h, as it is somewhat common for users to not charge every day.Smaller sub-peaks exist at 24 h intervals, indicating that even if a user skips a day, they are more likely to charge at a similar time as their last session.The standard deviation of this parameter is 23 h, which is equal to its median, indicating a relatively large amount of variance in the data.

Time to Next Charge
The time to next charge is measured from the start of the current session to the start of the next session.This parameter is one way to anticipate when charging sessions will occur for individual users or groups.The accurate prediction of this is important for anticipating the grid demand, or the scheduling of charging sessions.Figure 7 illustrates a histogram for the time to next charge in this dataset.There is a very wide distribution, with a peak near 24 h, indicating the common daily periodicity of charging behavior.The average is slightly longer than a day, however, at 26 h, as it is somewhat common for users to not charge every day.Smaller sub-peaks exist at 24 h intervals, indicating that even if a user skips a day, they are more likely to charge at a similar time as their last session.The standard deviation of this parameter is 23 h, which is equal to its median, indicating a relatively large amount of variance in the data.

Summary of EV Charging Behavior
The variables related to EV charging sessions exhibit different distribution patterns.The charging demand and duration are highly related and follow similar skewed patterns, with a concentration around the low values and long tails that have a hard limit due to the battery capacity.The connection duration, idle duration, and time to next charge exhibit highly irregular distributions, with large standard deviations relative to their means, especially for the time to next charge.Table 2 provides the mean, median, and standard deviation of the variables that are predicted in the following Section.The large variance in charging behavior between individual sessions represents a potential challenge for the prediction of these variables.
The connection durations are significantly longer than charging durations on average, and this is reflected in the large idle durations.This discrepancy could be utilized to

Summary of EV Charging Behavior
The variables related to EV charging sessions exhibit different distribution patterns.The charging demand and duration are highly related and follow similar skewed patterns, with a concentration around the low values and long tails that have a hard limit due to the battery capacity.The connection duration, idle duration, and time to next charge exhibit highly irregular distributions, with large standard deviations relative to their means, especially for the time to next charge.Table 2 provides the mean, median, and standard deviation of the variables that are predicted in the following Section.The large variance in charging behavior between individual sessions represents a potential challenge for the prediction of these variables.
The connection durations are significantly longer than charging durations on average, and this is reflected in the large idle durations.This discrepancy could be utilized to offset

EV Charging Behavior Prediction
This Section aims to assess the feasibility of predicting session parameters using only the information available at the start of the charging session.The four target variables are listed in Table 3.The accurate prediction of these four variables allows for the direct calculation of all the other unknown session parameters, such as the departure time and idle duration.Each output variable is treated as a function of all the known input parameters.Regression is used to approximate this underlying functional relationship, creating a model that maps the set of inputs for each charging session to the predicted output for that session.Figure 8 illustrates the overall process of training and testing these predictive models using machine learning.

EV Charging Behavior Prediction
This Section aims to assess the feasibility of predicting session parameters using only the information available at the start of the charging session.The four target variables are listed in Table 3.The accurate prediction of these four variables allows for the direct calculation of all the other unknown session parameters, such as the departure time and idle duration.Each output variable is treated as a function of all the known input parameters.Regression is used to approximate this underlying functional relationship, creating a model that maps the set of inputs for each charging session to the predicted output for that session.Figure 8 illustrates the overall process of training and testing these predictive models using machine learning.The user ID is deliberately omitted as an input variable to investigate the reliance of charging behavior on general user statistics, rather than establishing a unique functional relationship for each user.When user ID is included as a categorical variable, the number of features can grow to intractable amounts as the user base grows.Omitting the user ID may result in a lower accuracy, but this approach facilitates greater generalization across larger populations, ensures real-time prediction capability, and enables the exploration of shared charging behavior patterns among users.Essential session-specific statistics, such as the historical mean, maximum, minimum, and most recent values for each target variable (connection duration, charging demand, etc.), are computed based on the individual users' behavior.The additional factors considered are the number of previous charging The user ID is deliberately omitted as an input variable to investigate the reliance of charging behavior on general user statistics, rather than establishing a unique functional relationship for each user.When user ID is included as a categorical variable, the number of features can grow to intractable amounts as the user base grows.Omitting the user ID may result in a lower accuracy, but this approach facilitates greater generalization across larger populations, ensures real-time prediction capability, and enables the exploration of shared charging behavior patterns among users.Essential session-specific statistics, such as the historical mean, maximum, minimum, and most recent values for each target variable (connection duration, charging demand, etc.), are computed based on the individual users' behavior.The additional factors considered are the number of previous charging sessions, the average charging frequency for each user, and the time elapsed since the last session ended.Finally, several numerical and categorical temporal variables are included to describe the time of session start: the absolute time series value, the time of day, the day of the week, month, and season.The full set of input parameters is given in Table 4, alongside their abbreviations and descriptions.For predicting each target variable, only the previous statistics of that variable are included as model inputs; for example, the mean connection duration is not used as an input for predicting the charging duration.Thus, each model uses twelve input parameters, but four of these are specific to each target variable.Three machine learning algorithms, namely Gradient Boosting (XGBoost), Random Forest (RF), and Artificial Neural Network (ANN), are investigated for predicting the charging behavior.These algorithms strike a balance between accuracy and computational speed, making them suitable for real-time applications.
Machine learning models encounter challenges, notably the risk of overfitting, where excessive complexity hampers generalization to new data.Appropriate tuning and validation are essential to address this concern.Additionally, interpreting the machine learning results can be challenging due to a lack of an analytical or statistical relationship between the inputs and outputs.Linear regression is used in this study as a reference point.However, this incorrectly assumes a linear relationship between the dependent and independent variables and cannot accurately model the effect of categorical variables.

Data Processing and Splitting
Before inputting the data into the machine learning model, thorough data cleaning and splitting processes are crucial for ensuring the analysis' high quality and reliability.Cleaning involves addressing the missing or potentially erroneous values, while splitting divides the dataset into multiple groups for training and testing.
In addition to the initial cleaning described in Section 3, the data were further trimmed for regression modeling.The aim of this predictive framework is to assess the degree to which the users' behavior might be predictable with enough prior information.To this end, the users were only included in the regression dataset if they charged at least 500 times during the data-recording period.While this approach does exclude infrequent chargers (the minimum charging frequency is about once every 3 days), it ensures that sufficient data are available for each user for meaningful regression.The final dataset contains 161,948 sessions and 183 unique users.
In tackling overfitting, splitting the data into training and testing sets is essential.The training set, encompassing 80% of the data, acts as the laboratory, where the model learns patterns and relationships from the user data.During this phase, the model optimizes its parameters, minimizing the errors and enhancing the overall performance.The remaining 20% of the data is used to test the model's performance.This set consists of new, unseen data, ensuring that an overfit model will not yield deceptively accurate predictions.Evaluating the model's performance on this independent dataset provides a robust estimation of its generalization capabilities.This approach ensures the model's reliability and applicability in diverse real-world scenarios.
Determining how sessions are allocated to each subset is an important decision, shaping the model's performance.In the current framework, the dataset is first sorted by user, and the initial session of each user is excluded.This discarded session serves as the starting point for calculating the "historical" parameters for each user, like the mean, maximum, and minimum values for target variables, along with the time elapsed since the last charge.The first 80% of the charging sessions for each user based on chronological order are assigned to the training set.This ensures that the model learns from a substantial portion of each user's charging history, and the testing and training sets contain sufficient data for each user.The subsequent 20% of the charging sessions for each user are allocated to the testing set.This segment serves as a simulation of the "future" charging behavior, providing an evaluation of the model's capabilities on unseen data.No information is leaked from the test set to the training set, as all the features are calculated based only on the previous sessions.
Each model in this study is implemented using the Python programming language, which offers a comprehensive set of libraries and tools for machine learning tasks.The code for model implementation is organized and executed within Jupyter Notebook, an interactive environment that facilitates code development, visualization, and documentation.For model training and evaluation, several packages from the Scikit-learn library are utilized [31].Scikit-learn provides convenient functions for training models and tuning parameters for the Linear, XGBoost [32], Random Forest [33], and Artificial Neural Network [34] methods.

EV Charging Behavior Prediction Results
The following subsections quantify the predictive performance of the models trained on the dataset for each of the four target variables.A total of five predictive models are considered: three machine learning models (RF, XGBoost, and ANN), linear regression, and a baseline model referred to as the "Mean Model".The "Mean Model" simply assumes the target variable for each session is equal to the historical mean for that user, calculated from the user's prior sessions.This model serves as a reference point to assess whether regression provides a meaningful increase in accuracy beyond this simple assumption.For each target variable, the following results are provided: Accuracy Metrics-After applying each model to the test data, the R 2 , mean absolute error (MAE), and Root Mean Square Error (RMSE) are provided in a table to quantitatively compare the performance of each model.Higher values of R 2 and lower values of MAE/RMSE indicate more accurate predictions.
Scatter Plots-The predicted values are plotted against the actual values for the linear model and the best-performing ML model.This provides a more comprehensive visualization of model performance than the accuracy metrics alone, allowing for the identification of patterns, clusters, and ranges of a higher or lower accuracy.
Feature Importance plot: The models are trained on a large, comprehensive set of input features, and it is important to identify which of these are actually useful for predicting charging behavior.A bar chart is provided for the most accurate ML model, highlighting the relative importance of each feature, as determined by Gini importance.
Recursive Feature Elimination (RFE): RFE is performed using the best performing ML model to refine the feature selection process iteratively.In contrast to the feature importance calculations, this method determines the most relevant features by systematically eliminating the less-important ones, running the model with fewer features for each iteration.The accuracy of the model (quantified by adjusted R 2 ) is plotted versus the number of included features to determine how the accuracy increases with each feature added.

Connection Duration
Table 5 provides the accuracy metrics for each method of predicting the connection duration.The RF algorithm outperformed the other approaches, as evidenced by lower values for the MAE (210 min) and RMSE (304 min).However, the R 2 value of 0.39 indicates that only roughly 40% of the variance in connection duration is explained by the model.The MAE represents a relatively high percentage of the average connection duration (40%), indicating that prediction errors are often large relative to the actual connection durations.Nevertheless, the RF model represents a significant improvement in accuracy over both the "Mean Model" and linear regression, with an MAE 25% lower than that of the linear model.Figure 9 presents a scatter plot of the predicted and observed connection durations for the linear and RF models.Values closer to the dotted line indicate more accurate predictions.In both the plots, clustering is evident for both the observed values and the predictions.A narrower vertical spread of predictions in the linear plot indicates that the linear model is not as responsive to the input features.Both the models consistently overpredict low connection durations and underpredict higher durations, but RF has a notably better performance than the linear model for predicting connection durations of less than 400 min.
iteration.The accuracy of the model (quantified by adjusted R 2 ) is plotted versus the number of included features to determine how the accuracy increases with each feature added.

Connection Duration
Table 5 provides the accuracy metrics for each method of predicting the connection duration.The RF algorithm outperformed the other approaches, as evidenced by lower values for the MAE (210 min) and RMSE (304 min).However, the R 2 value of 0.39 indicates that only roughly 40% of the variance in connection duration is explained by the model.The MAE represents a relatively high percentage of the average connection duration (40%), indicating that prediction errors are often large relative to the actual connection durations.Nevertheless, the RF model represents a significant improvement in accuracy over both the "Mean Model" and linear regression, with an MAE 25% lower than that of the linear model.Figure 9 presents a scatter plot of the predicted and observed connection durations for the linear and RF models.Values closer to the dotted line indicate more accurate predictions.In both the plots, clustering is evident for both the observed values and the predictions.A narrower vertical spread of predictions in the linear plot indicates that the linear model is not as responsive to the input features.Both the models consistently overpredict low connection durations and underpredict higher durations, but RF has a notably better performance than the linear model for predicting connection durations of less than 400 min.noting that the time elapsed since the last charge has a very little effect on the connection duration as well.The RFE plot in Figure 11 illustrates the increase in model accuracy (as measured by adjusted R 2 ) for the RF method as more features are included.There is a rapid increase in accuracy for up to 3-4 features.Including six features still increases the model accuracy slightly, but all the features beyond this provide almost no improvement to the model.The six most significant features as determined by the RFECV process are shown alongside the plot and are the same as those determined by Gini importance in Figure 10, though with slight differences in ranking.

Charging Duration
Table 6 summarizes the accuracy metrics for each method of predicting the charging duration.The RF model again outperforms the other methods, with the highest R 2 value of 0.47 and the lowest MAE and RMSE values of 40 and 54 min, respectively.The XGBoost method also shows a similar performance.All the ML models show a significant improvement over the linear and Mean Models, but the predictive accuracy is still mediocre, capturing only 47% of the variance in charging duration.The MAE of the RF model (40 min) The RFE plot in Figure 11 illustrates the increase in model accuracy (as measured by adjusted R 2 ) for the RF method as more features are included.There is a rapid increase in accuracy for up to 3-4 features.Including six features still increases the model accuracy slightly, but all the features beyond this provide almost no improvement to the model.The six most significant features as determined by the RFECV process are shown alongside the plot and are the same as those determined by Gini importance in Figure 10, though with slight differences in ranking.The RFE plot in Figure 11 illustrates the increase in model accuracy (as measured by adjusted R 2 ) for the RF method as more features are included.There is a rapid increase in accuracy for up to 3-4 features.Including six features still increases the model accuracy slightly, but all the features beyond this provide almost no improvement to the model.The six most significant features as determined by the RFECV process are shown alongside the plot and are the same as those determined by Gini importance in Figure 10, though with slight differences in ranking.

Charging Duration
Table 6 summarizes the accuracy metrics for each method of predicting the charging duration.The RF model again outperforms the other methods, with the highest R 2 value of 0.47 and the lowest MAE and RMSE values of 40 and 54 min, respectively.The XGBoost method also shows a similar performance.All the ML models show a significant improvement over the linear and Mean Models, but the predictive accuracy is still mediocre, capturing only 47% of the variance in charging duration.The MAE of the RF model (40 min)

Charging Duration
Table 6 summarizes the accuracy metrics for each method of predicting the charging duration.The RF model again outperforms the other methods, with the highest R 2 value of 0.47 and the lowest MAE and RMSE values of 40 and 54 min, respectively.The XG-Boost method also shows a similar performance.All the ML models show a significant improvement over the linear and Mean Models, but the predictive accuracy is still mediocre, capturing only 47% of the variance in charging duration.The MAE of the RF model (40 min) represents 31% of the average charging duration (128 min), indicating a moderately large average error relative to the size of the target variable.Figure 12 provides a comparison between the predicted and observed charging durations for the linear and RF models.The RF predictions are visibly more clustered around the dotted line, but there is still a large spread.Both the models struggle with extreme values, as short charging durations are often significantly overpredicted, and long charging durations are underpredicted.
Energies 2024, 17, x FOR PEER REVIEW 13 of 21 represents 31% of the average charging duration (128 min), indicating a moderately large average error relative to the size of the target variable.Figure 12 provides a comparison between the predicted and observed charging durations for the linear and RF models.The RF predictions are visibly more clustered around the dotted line, but there is still a large spread.Both the models struggle with extreme values, as short charging durations are often significantly overpredicted, and long charging durations are underpredicted.The feature importance plot for the RF method is shown in Figure 13.The mean charging duration is the most significant feature for predicting the charging duration of each session by a large margin.Similar to the connection duration, the time of day and absolute time series are also important.In direct contrast to the connection duration, however, the time elapsed since the last charge is the next most significant, which is consistent with the assumption that vehicles that have not been charged recently will need longer to recharge.The categorical temporal variables (season, month, and day) have little effect on the charging duration predictions, in addition to the historical minimums and maximums.The feature importance plot for the RF method is shown in Figure 13.The mean charging duration is the most significant feature for predicting the charging duration of each session by a large margin.Similar to the connection duration, the time of day and absolute time series are also important.In direct contrast to the connection duration, however, the time elapsed since the last charge is the next most significant, which is consistent with the assumption that vehicles that have not been charged recently will need longer to recharge.The categorical temporal variables (season, month, and day) have little effect on the charging duration predictions, in addition to the historical minimums and maximums.
Figure 14 illustrates the RFE results.The prediction accuracy increases with each feature added until six features, after which more features do not improve the model performance.The six most important features determined by RFE and feature importance are the same, although the frequency of charging is ranked higher by the RFE.

Charging Demand
The accuracy metrics across all the models for predicting the charging demand are summarized in Table 7. Once again the RF model exhibits the best overall performance across all the metrics, with an R 2 of 0.48, an MAE of 3.6 kWh, and an RMSE of 5.1 kWh.Both RF and XGBoost show a notable improvement over the linear and Mean Models, but the predictive accuracy is still not high.The average error of the RF model is 28% of the average charging demand in the dataset.Figure 14 illustrates the RFE results.The prediction accuracy increases with each feature added until six features, after which more features do not improve the model performance.The six most important features determined by RFE and feature importance are the same, although the frequency of charging is ranked higher by the RFE.

Charging Demand
The accuracy metrics across all the models for predicting the charging demand are summarized in Table 7. Once again the RF model exhibits the best overall performance across all the metrics, with an R 2 of 0.48, an MAE of 3.6 kWh, and an RMSE of 5.1 kWh.Both RF and XGBoost show a notable improvement over the linear and Mean Models, but the predictive accuracy is still not high.The average error of the RF model is 28% of the average charging demand in the dataset.

Charging Demand
The accuracy metrics across all the models for predicting the charging demand are summarized in Table 7. Once again the RF model exhibits the best overall performance across all the metrics, with an R 2 of 0.48, an MAE of 3.6 kWh, and an RMSE of 5.1 kWh.Both RF and XGBoost show a notable improvement over the linear and Mean Models, but the predictive accuracy is still not high.The average error of the RF model is 28% of the average charging demand in the dataset.In Figure 15, the plots of predicted versus observed charging demands show a distinct improvement with RF over linear regression.The linear model is highly flat, underpredicting all the high values, and overpredicting most of the low ones.The RF predictions are consistently closer to the actual values, but the model still struggles with extremely high or low charging demands.The RFE plot in Figure 17 follows a similar trend to the previous target variables.The model sees significant increases in accuracy with each feature added, until saturating at 5-6 features.The most significant features, as determined by recursive elimination, are again the same as those determined by Gini importance, but in a slightly different order.This set of features is also identical to the most important features for charging duration prediction, highlighting the similarity of these two variables.In practice, it may be sufficient to perform predictive modeling on only one of these variables and calculate the other based on the average charging power.The RFE plot in Figure 17 follows a similar trend to the previous target variables.The model sees significant increases in accuracy with each feature added, until saturating at 5-6 features.The most significant features, as determined by recursive elimination, are again the same as those determined by Gini importance, but in a slightly different order.This set of features is also identical to the important features for charging duration prediction, highlighting the similarity of these two variables.In practice, it may be sufficient to perform predictive modeling on only one of these variables and calculate the other based on the average charging power.

Time until Next Charge
Table 8 summarizes the performance of each model for the prediction of time until the next charging session.This parameter is significantly more difficult to predict than the previous target variables.In addition to having a very wide distribution (the standard deviation is 88% of the mean), the best-performing ML models (RF and XGBoost) only explain roughly 20% of the variance in the output, as quantified by their R 2 values.These models are only a very slight improvement over the "Mean Model" and linear regression, and for all the methods, the predicted values are largely uncorrelated with the actual observations.Nevertheless, of the approaches considered, the RF/XGBoost models provide the best predictions possible given the information available at the start of each session, and the RMSE of the models are significantly lower than the standard deviation of the data, making modeling a much better approach than assuming constant values.The predicted versus observed values are plotted for the linear and RF models in Figure 18.The RF results exhibit only a slight improvement over the linear model.In both cases, the predictions have a flat distribution and are largely uncorrelated with the observations.While there are noticeable clusters in the observed target values, these clusters are not present in the predictions.

Time until Next Charge
Table 8 summarizes the performance of each model for the prediction of time until the next charging session.This parameter is significantly more difficult to predict than the previous target variables.In addition to having a very wide distribution (the standard deviation is 88% of the mean), the best-performing ML models (RF and XGBoost) only explain roughly 20% of the variance in the output, as quantified by their R 2 values.These models are only a very slight improvement over the "Mean Model" and linear regression, and for all the methods, the predicted values are largely uncorrelated with the actual observations.Nevertheless, of the approaches considered, the RF/XGBoost models provide the best predictions possible given the information available at the start of each session, and the RMSE of the models are significantly lower than the standard deviation of the data, making modeling a much better approach than assuming constant values.The predicted versus observed values are plotted for the linear and RF models in Figure 18.The RF results exhibit only a slight improvement over the linear model.In both cases, the predictions have a flat distribution and are largely uncorrelated with the observations.While there are noticeable clusters in the observed target values, these clusters are not present in the predictions.
The feature importance results for the RF method are plotted in Figure 19.The mean time until next charge is the most significant feature, followed by the previous value and time of day.The temporal categorical values, along with minimum and maximum, remain the least important features for prediction.The feature importance results for the RF method are plotted in Figure 19.The mean time until next charge is the most significant feature, followed by the previous value and time of day.The temporal categorical values, along with minimum and maximum, remain the least important features for prediction.Figure 20 shows the results of RFE.Here, near-maximum accuracy is obtained with only 3-4 features, and additional features beyond 6 can actually decrease the adjusted R 2 .For this model, a smaller set of features is likely appropriate due to the lack of correlation between most input parameters and the target variable.The feature importance results for the RF method are plotted in Figure 19.The mean time until next charge is the most significant feature, followed by the previous value and time of day.The temporal categorical values, along with minimum and maximum, remain the least important features for prediction.Figure 20 shows the results of RFE.Here, near-maximum accuracy is obtained with only 3-4 features, and additional features beyond 6 can actually decrease the adjusted R 2 .For this model, a smaller set of features is likely appropriate due to the lack of correlation between most input parameters and the target variable.Figure 20 shows the results of RFE.Here, near-maximum accuracy is obtained with only 3-4 features, and additional features beyond 6 can actually decrease the adjusted R 2 .For this model, a smaller set of features is likely appropriate due to the lack of correlation between most input parameters and the target variable.

Conclusions
The comprehensive analysis of EV charging behavior is crucial to accommodate the increasing penetration of EVs.The trends in users' behavior provide an important baseline for the planning of charging infrastructure, electric grid upgrades and operational strategies, public and private incentive programs, and commercial opportunities.Looking at 4 years of residential charging data from Omaha, NE, this study found that the EV users have a large degree of variance in their home charging behavior.The important parameters, like the start and end times, connection duration, and time between sessions, have wide and irregular distributions.The charging duration and charging demand are highly correlated and have skewed distributions, with many short sessions, but a large range.The connection durations are significantly longer than the charging durations in households, and this idle time could be utilized for the more efficient scheduling of charging, or vehicle-to-grid operations.
The high variance in charging behavior presents both the need for the prediction of the relevant variables and the challenge in accurately doing so.This study examines the feasibility of predicting four key charging parameters at the time of EV plug in using only the previous information that is commonly recorded by charging service providers.The framework is constructed to avoid computationally expensive user-specific models, relying instead on statistically engineered input parameters for each user.These parameters are related to the target variables through the training of machine learning models and tested on an unseen portion of the same dataset.
The Random Forest algorithm yields the most accurate predictions for each variable in this case study.XGBoost exhibits a very similar performance, and the Artificial Neural Network is only slightly behind, with the average errors being 15% higher at most.The connection duration, charging duration, and charging demand can be predicted with moderate accuracy, with R 2 values ranging from 0.40 to 0.48.This marks a significant improvement over the assumption that the charging behavior follows a stable average as well as over the linear model.The average errors are still significant, however, indicating that individual residential charging behavior is still highly random, with respect to the information known to the charging service providers.This is especially evident in the final parameter studied: time to next charge.With an R 2 of 0.21 or lower, even the machine learning models are incapable of accurately predicting the next session's start time for an individual user, given the information considered in this study.While this framework yields positive results for predicting the duration and demand of the current session, anticipating when the next session will start is likely better handled by alternative frameworks.
While this study examines 12 possible input parameters for each model, optimal accuracy is typically achieved with just 4-6.The consistently important input parameters include the user's historical average for the variable being predicted, the start time of the

Conclusions
The comprehensive analysis of EV charging behavior is crucial to accommodate the increasing penetration of EVs.The trends in users' behavior provide an important baseline for the planning of charging infrastructure, electric grid upgrades and operational strategies, public and private incentive programs, and commercial opportunities.Looking at 4 years of residential charging data from Omaha, NE, this study found that the EV users have a large degree of variance in their home charging behavior.The important parameters, like the start and end times, connection duration, and time between sessions, have wide and irregular distributions.The charging duration and charging demand are highly correlated and have skewed distributions, with many short sessions, but a large range.The connection durations are significantly longer than the charging durations in households, and this idle time could be utilized for the more efficient scheduling of charging, or vehicle-to-grid operations.
The high variance in charging behavior presents both the need for the prediction of the relevant variables and the challenge in accurately doing so.This study examines the feasibility of predicting four key charging parameters at the time of EV plug in using only the previous information that is commonly recorded by charging service providers.The framework is constructed to avoid computationally expensive user-specific models, relying instead on statistically engineered input parameters for each user.These parameters are related to the target variables through the training of machine learning models and tested on an unseen portion of the same dataset.
The Random Forest algorithm yields the most accurate predictions for each variable in this case study.XGBoost exhibits a very similar performance, and the Artificial Neural Network is only slightly behind, with the average errors being 15% higher at most.The connection duration, charging duration, and charging demand can be predicted with moderate accuracy, with R 2 values ranging from 0.40 to 0.48.This marks a significant improvement over the assumption that the charging behavior follows a stable average as well as over the linear model.The average errors are still significant, however, indicating that individual residential charging behavior is still highly random, with respect to the information known to the charging service providers.This is especially evident in the final parameter studied: time to next charge.With an R 2 of 0.21 or lower, even the machine learning models are incapable of accurately predicting the next session's start time for an individual user, given the information considered in this study.While this framework yields positive results for predicting the duration and demand of the current session, anticipating when the next session will start is likely better handled by alternative frameworks.
While this study examines 12 possible input parameters for each model, optimal accuracy is typically achieved with just 4-6.The consistently important input parameters include the user's historical average for the variable being predicted, the start time of the session, and the frequency that the user charges.The time elapsed since the last session is useful for predicting the charging duration and demand, but it is not related to the connection duration.Temporal patterns are best handled with an absolute-time-series

Figure 1 .
Figure 1.The percentage of total household charging sessions with a given start time.

Figure 1 .
Figure 1.The percentage of total household charging sessions with a given start time.

Figure 2 .
Figure 2. The percentage of total household charging sessions with a given end time.

Figure 3 .
Figure 3.The percentage of total household charging sessions with a given connection duration.

Figure 2 .
Figure 2. The percentage of total household charging sessions with a given end time.

Figure 2 .
Figure 2. The percentage of total household charging sessions with a given end time.

Figure 3 .
Figure 3.The percentage of total household charging sessions with a given connection duration.

Figure 3 .
Figure 3.The percentage of total household charging sessions with a given connection duration.
17, x FOR PEER REVIEW 6 of 21 aid in optimizing charging, managing infrastructure, and planning for factors like availability, time, and costs.

Figure 4 .
Figure 4.The percentage of total household charging sessions with a given charging duration.

Figure 5 .
Figure 5.The percentage of total household charging sessions with a given idle duration.

Figure 4 .
Figure 4.The percentage of total household charging sessions with a given charging duration.

Energies 2024 ,
17, x FOR PEER REVIEW 6 of 21 aid in optimizing charging, managing infrastructure, and planning for factors like availability, time, and costs.

Figure 4 .
Figure 4.The percentage of total household charging sessions with a given charging duration.

Figure 5 .
Figure 5.The percentage of total household charging sessions with a given idle duration.

Figure 5 .
Figure 5.The percentage of total household charging sessions with a given idle duration.

Figure 6 .
Figure 6.Histogram of charging demand for each session.

Figure 7 .
Figure 7. Histogram of time to next charge for each session.

Figure 6 .
Figure 6.Histogram of charging demand for each session.

Figure 6 .
Figure 6.Histogram of charging demand for each session.

Figure 7 .
Figure 7. Histogram of time to next charge for each session.

Figure 7 .
Figure 7. Histogram of time to next charge for each session.

Energies 2024 ,
17, 925 8 of 20 charging until off-peak hours through incentives or centralized scheduling, or to utilize EVs as storage in vehicle-to-grid operations.
The energy consumed during the session, in kWh Connection Duration Numeric The connection duration of the session, in minutes Charging Duration Numeric The charging duration of the session, in minutes Time to Next Charge Numeric The time until the next session begins, in minutes

Figure 10
Figure 10 shows the relative importance of each feature in the RF model.The time of day, mean connection duration, and the absolute time series value have the most significance for predicting connection duration.The season has almost no impact, and it is worth

Figure 10
Figure10shows the relative importance of each feature in the RF model.The time of day, mean connection duration, and the absolute time series value have the most significance for predicting connection duration.The season has almost no impact, and it is worth noting that the time elapsed since the last charge has a very little effect on the connection duration as well.

Figure 10 .
Figure 10.Feature importance of connection duration prediction for each method.

Figure 11 .
Figure 11.RFE analysis of RF model for prediction of connection duration.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 10 .
Figure 10.Feature importance of connection duration prediction for each method.

Energies 2024 ,
17, x FOR PEER REVIEW 12 of 21noting that the time elapsed since the last charge has a very little effect on the connection duration as well.

Figure 10 .
Figure 10.Feature importance of connection duration prediction for each method.

Figure 11 .
Figure 11.RFE analysis of RF model for prediction of connection duration.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 11 .
Figure 11.RFE analysis of RF model for prediction of connection duration.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 13 .
Figure 13.Feature importance of charging duration predictions for each method.

Figure 14
Figure14illustrates the RFE results.The prediction accuracy increases with each feature added until six features, after which more features do not improve the model performance.The six most important features determined by RFE and feature importance are the same, although the frequency of charging is ranked higher by the RFE.

Figure 14 .
Figure 14.RFE analysis of RF model for prediction of charging duration.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 13 .
Figure 13.Feature importance of charging duration predictions for each method.

Figure 13 .
Figure 13.Feature importance of charging duration predictions for each method.

Figure 14 .
Figure 14.RFE analysis of RF model for prediction of charging duration.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 14 .
Figure 14.RFE analysis of RF model for prediction of charging duration.Accuracy vs. feature count (left) and 6 most important features (right).

In Figure 15 ,
the plots of predicted versus observed charging demands show a distinct improvement with RF over linear regression.The linear model is highly flat, underpredicting all the high values, and overpredicting most of the low The RF predictions are consistently closer to the actual values, but the model still struggles with extremely high or low charging demands.

Figure 15 .
Figure 15.Predicted vs. observed charging demands for linear (left) and RF (right) models.

Figure 16
Figure 16 displays the relative feature importance for the input parameters in the RF model.Similar to the charging duration, the mean charging demand is the most significant input parameter by a large margin.The time of day and time since last charge are the next most impactful.Again, similar to the charging duration, the categorical variables of season, month, and day of the week are the least important, along with the historical minimums and maximums.

Figure 16 .
Figure 16.Feature importance of charging demand predictions for each method.

Figure 15 .
Figure 15.Predicted vs. observed charging demands for linear (left) and RF (right) models.

Figure 16 In Figure 15 ,
Figure 16 displays the relative feature importance for the input parameters in the RF model.Similar to the charging duration, the mean charging demand is the most significant input parameter by a large margin.The time of day and time since last charge are the next most impactful.Again, similar to the charging duration, the categorical variables of season, month, and day of the week are the least important, along with the historical minimums and maximums.

Figure 15 .
Figure 15.Predicted vs. observed charging demands for linear (left) and RF (right) models.

Figure 16
Figure 16 displays the relative feature importance for the input parameters in the RF model.Similar to the charging duration, the mean charging demand is the most significant input parameter by a large margin.The time of day and time since last charge are the next most impactful.Again, similar to the charging duration, the categorical variables of season, month, and day of the week are the least important, along with the historical minimums and maximums.

Figure 16 .
Figure 16.Feature importance of charging demand predictions for each method.Figure 16.Feature importance of charging demand predictions for each method.

Figure 16 .
Figure 16.Feature importance of charging demand predictions for each method.Figure 16.Feature importance of charging demand predictions for each method.

Figure 17 .
Figure 17.RFE analysis of RF model for prediction of charging demand.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 17 .
Figure 17.RFE analysis of RF model for prediction of charging demand.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 18 .
Figure 18.Predicted vs. observed times until the next session for linear (left) and RF (right) models.

Figure 19 .
Figure 19.Feature importance of time for the next session predictions for each method.

Figure 18 .
Figure 18.Predicted vs. observed times until the next session for linear (left) and RF (right) models.

Figure 18 .
Figure 18.Predicted vs. observed times until the next session for linear (left) and RF (right) models.

Figure 19 .
Figure 19.Feature importance of time for the next session predictions for each method.

Figure 19 .
Figure 19.Feature importance of time for the next session predictions for each method.

Energies 2024 , 21 Figure 20 .
Figure 20.RFE analysis of RF model for prediction of time until next charge.Accuracy vs. feature count (left) and 6 most important features (right).

Figure 20 .
Figure 20.RFE analysis of RF model for prediction of time until next charge.Accuracy vs. feature count (left) and 6 most important features (right).

Table 1 .
Yearly summary of household charging.

Table 1 .
Yearly summary of household charging. No.

Table 2 .
Summary of the distribution of charging behavior variables.

Table 3 .
Target variables for each charging session.
Type Description Charging Demand Numeric The energy consumed during the session, in kWh Connection Duration Numeric The connection duration of the session, in minutes Charging Duration Numeric The charging duration of the session, in minutes Time to Next Charge Numeric The time until the next session begins, in minutes

Table 2 .
Summary of the distribution of charging behavior variables.

Table 3 .
Target variables for each charging session.

Table 4 .
Input parameters for each charging session.

Table 5 .
Accuracy metrics for predicting connection duration.

Table 5 .
Accuracy metrics for predicting connection duration.

Table 6 .
Accuracy metrics for predicting charging duration.

Table 6 .
Accuracy metrics for predicting charging duration.

Table 7 .
Accuracy metrics for predicting charging demand.

Table 7 .
Accuracy metrics for predicting charging demand.

Table 7 .
Accuracy metrics for predicting charging demand.

Table 8 .
Accuracy metrics for predicting time for the next session in min.

Table 8 .
Accuracy metrics for predicting time for the next session in min.