A Review of Electric Vehicle Load Open Data and Models

: The ﬁeld of electric vehicle charging load modelling has been growing rapidly in the last decade. In light of the Paris Agreement, it is crucial to keep encouraging better modelling techniques for successful electric vehicle adoption. Additionally, numerous papers highlight the lack of charging station data available in order to build models that are consistent with reality. In this context, the purpose of this article is threefold. First, to provide the reader with an overview of the open datasets available and ready to be used in order to foster reproducible research in the ﬁeld. Second, to review electric vehicle charging load models with their strengths and weaknesses. Third, to provide suggestions on matching the models reviewed to six datasets found in this research that have not previously been explored in the literature. The open data search covered more than 860 repositories and yielded around 60 datasets that are relevant for modelling electric vehicle charging load. These datasets include information on charging point locations, historical and real-time charging sessions, trafﬁc counts, travel surveys and registered vehicles. The models reviewed range from statistical characterization to stochastic processes and machine learning and the context of their application is assessed.


Introduction
Assuming a low-carbon energy mix, Electric Vehicles (EVs) are a credible alternative to internal combustion engine vehicles (ICEVs) supporting the transportation sector in its low-carbon transition. A substantial number of governments are heavily investing in electric mobility with more than 5.1 million electric passenger cars on the roads globally in 2018, according to the International Energy Agency (IEA) [1]. Several countries are achieving high rates of EV adoption such as Norway which approached an EV market share of almost 47% in 2019 [1]. This is due in large part to major incentives implemented by governments to foster EV uptake [2]. The EV30@30 Campaign [3] sets a target of 30% EV market share by 2030 for the member countries of the Electric Vehicle Initiative (EVI) [4]. This enthusiasm for EVs comes hand in hand with great concern about how to manage the surge in electricity demand which could greatly disrupt the current schedule [5].
In order to overcome potential pitfalls, businesses and researchers are proposing solutions including pricing strategies [6] and smart charging [7]. The goal of these solutions is to avoid dramatically shifting EV users' behaviours and power plants production schedules. However, their implementation requires a precise understanding of charging behaviours. Thus, EV load models are necessary in order to better understand the impacts of EVs on the grid. With this information, the merit of EV charging strategies can be realistically assessed.
In this article, the term "EVs" refers to small vehicles (e.g., light motorcycles), passenger vehicles (e.g., cars) and goods-carrying vehicles (e.g., trucks) as per the classification from the European Commissions' official report "Mobility and Transport: Vehicle Categories" refs. [8,9]. Passenger vehicles constitute the majority of EVs. Additionally, all energy system management are considered: Battery EV, Fuel-Cell EV or Plug-In Hybrid EV [10]. Furthermore, Electric Vehicle Supply Equipment (EVSE) will be referred to as any type of charging point, be it public or private. Finally, an EV charging session (or transaction) refers to the period of time an EV has spent charging at an EVSE.

Aims and Strategies for EV Charging Schemes
Electricity distribution occurs such that at any point in time and space, the consumption has to be equal to the production in order to avoid severe consequences such as blackouts [11]. A significant rise in the number of EVs in circulation leads to an increase in electricity demand which could cause such a blackout if the balance in the grid is not effectively maintained. Therefore, EVs have an important role to play in maintaining this balance [5]. The purpose of this section is to explore the different aims and strategies required to overcome the potential difficulties caused by increased EV penetration. Figure 1 summarizes these aims and strategies. Incentivized flexibility and controlled flexibility are used to achieve specific aims while uncontrolled charging lets the market decide the prioritization of these aims. Load flattening and load balancing are the most common aims found in the literature and they are the focus of the following paragraphs.

Load Flattening
While some studies show minimal impact of EVs on peak load [12,13], the consensus in the field is that the grid will not be able to sustain its operations with the projected demand from EVs [6,9,[14][15][16][17][18][19].
One of the first articles dealing with the impact of EVs on load management was written in 1983 [20]. In this article, EVs were suggested as a way to minimize the overall grid load factor f . This factor is defined as the ratio of the average load (L) over the maximum load in a given period of time: f = avg(L)/max(L). The maximization of this quantity results in a more efficient distribution of resources over time. The article proposed that using off-peak recharging of EVs will significantly increase the load factor. This means shifting the EV demand to times when the rest of the demand is low (e.g., night time) in order to flatten the load curve. The flexibility analysis produced in [21] suggests that it is possible to shift the EV charging to the afternoon and night valleys for different clusters of users without changing their behaviours. This could lead to peak reduction and load factor maximization with little change to users' requirements and lifestyles. Articles such as [22] strived to estimate the benefits of this kind of controlled or incentivized EV charging. However, these articles do not always account for potential mistakes in load forecasting, therefore the benefits calculated could be inaccurate. Hence, it is critical to improve EV load forecasting models in order to alleviate the risk of unrealistic optimization schedules for maximizing the load factor.

Load Balancing
An early article from 1997 [23] considered using EVs as a source of electricity for the grid when demand is high. In other words, using EVs plugged-in to the grid as an ancillary service or as way to bring flexibility to the overall shape of the load. According to a study focused on 400,000 EV charging transactions from 2012 to 2016 in the Netherlands, 75% of EVs connected at public EVSEs are already fully charged [24]. This study therefore supports the strategy of using fully recharged EVs which are still connected, as a source of energy in order to supply the grid. This paradigm shift, using what could be a major constraint and treating it as an opportunity, is called "vehicle-to-grid" (V2G).
Additionally, integrating renewable energy sources onto the grid is also the focus of numerous studies [7]. Many countries with climate related commitments are aiming to increase the share of renewables in their energy mix. However, the main drawback of renewable energies is their intermittent delivery of supply. Indeed, solar panels and wind farms are highly weather-dependent. In this context, EVs can adequately balance the energy coming from renewable power plants. This strategy consists in considering multiple EVs acting as a large battery or electricity storage system which can be discharged back into the grid when weather conditions do not allow renewable power plants to produce enough energy [19].
Although V2G has many advantages, one drawback is that it reduces battery lifetime by adding unnecessary cycles of charge and discharge to the vehicle [25]. Furthermore, this strategy requires the existence of global and local communication and monitoring channels which do not exist yet. These channels are necessary for the development of EVs in general and particularly for V2G and load balancing [26,27]. Finally, in order to ensure effective communication, EV load models are critical as they can reduce uncertainty and minimize contradicting signals from what is expected and what is observed by operations management.

Paper Structure and Contributions
The purpose of this article is to enable a better understanding of EV load data available and models produced in the literature. The main contributions of this article are as follows: • The results of an in-depth open data search with a structured list of datasets available for use • A comprehensive review of EV load models including their strengths, weaknesses and their application in the literature • A preliminary study on matching EV load models to six open datasets found in this research and not previously explored in the literature The rest of the article is structured as follows. Section 2 defines EV load and its most common drivers. Section 3 presents the open data found which can be used to model EV load. Section 4 reviews EV load models comparing the different approaches taken. Section 5 explores charging session data not previously explored in the literature and suggestions are provided on the models reviewed that could be applied to these datasets. Finally, Section 6 highlights the current knowledge gaps and discusses the different options in order to pave the way for future work.

EV Load and Its Main Drivers in the Literature
EV load corresponds to the power or energy consumed at EVSEs over time. This information can also be directly derived from other closely related factors. In particular, knowing the arrival time and charge duration of EVs allows a deterministic reconstruction of EV load.

EV Load as a Model Output
EV load can be considered at different levels of aggregation. The total energy demand at all EVSEs can be referred to as the aggregated output of EV load models. The same model output can be envisaged in a disaggregated fashion. Two setups are widely used in practice. The first is vehicle-centric which considers the contribution of each EV member of a fleet to the aggregated load. The second is EVSE-centric which considers the perspective of one or multiple EVSEs. Neither approach is mutually exclusive and the two setups can be combined to model EV load.

Aggregated
The aggregated approach is shown in various articles such as [24,28] where the total EV load across multiple EVSEs is modelled. In [24] 1750 charging stations (2900 charging points) are used while [28] uses a single station with many charging piles. This kind of approach usually performs well due to the smoothness of the aggregated load curve assuming there are enough EVs or charging stations in scope. While they give a holistic view of the charging load, they can lack detail with regards to the temporal and spatial distribution of the load which is one of the key concerns raised in the literature [5].

Vehicle-Centric
In order to explore the finer details of EV load, a vehicle-centric approach can be adopted. In [29] individual EV loads are modelled in order to recover the aggregated load. This approach can be qualified as a vehicle-centric approach as it uses individual outputs of EVs. In this case, it is assessed in terms of aggregated load. The same can be said for [30] and [31] where individual behaviours are modelled. A similar study can be found in [32] where the EV load outputs are separated into urban and rural behaviours while [13] looks at public and residential charging. This can give a better understanding of the spatial and temporal properties of EV load.
Additionally, models which consider the spatial components of EV charge are detailed in [9,18]. For instance, in [18] four schedules for EVs are identified which enables one to better distinguish and evaluate their temporal impact on the grid. Furthermore, the spatial dimension is addressed by modelling EV charging locations. Both outputs are brought together in order to reconstruct the aggregated EV load.

EVSE-Centric
The EVSE-centric approach is rare in the literature as it usually is superseded by the vehicle-centric approach. However, there are some occurrences of such work for instance in ref. [12] where residential charge is envisaged from each household perspective. The authors used a bottom-up approach to forecast the aggregated EV load using each household individual load. The debate of using EVSE-centric over vehicle-centric approaches is illustrated in [33]. In this article, it was found that both approaches yield comparable prediction errors even though the EVSE-centric approach was slower to compute.

Battery
Battery inputs are variables which closely relate to the charging demand of EV load from a "physical/chemical" perspective. The most common ones used across the literature are the State of Charge (SoC), Energy Consumption (E) and battery capacity (C). Generally speaking, the SoC is the rate at which the battery is charged whether the EV is plugged-in, idle or travelling [34]. The SoC when the EV arrives at an EVSE is a critical influential factor of EV demand. This is referred to as the initial SoC (SoC init ) in the literature. By incorporating the distance travelled (D) by the EV, Equation (1) defines SoC init as follows: with C in kWh, E in kWh/km, D in km and SoC init in %. On one hand, battery capacity and other engine specifications are usually assumed to be known constants. Based on EU MERGE data, probability functions were derived to characterize EV specifications in [9]. On the other hand, the SoC and energy consumption evolve over time and with vehicle usage. Both are highly correlated and they can be deduced from each other from the formula above or by a set of assumptions. For instance, the initial SoC of EVs is assumed to be equal to 0%, 30%, 60% in [35] to match different scenarios. Similarly, the initial SoC is used as an input in [36] along with D, C and the charging rate of the EV charging model in [37]. Furthermore, in [30,38] battery specifications and stochastic characteristics are also part of model inputs. Finally, the EV load itself can be used as an input when considering time series approaches [33].

Travel
From this literature review, it appears that travel behaviours are the most widely used exogenous factor driving EV load models. It is important to distinguish between travel inputs extracted from travel surveys [14,39] or estimated pattern data [9,30,40] (which usually require further statistical treatment to be part of a model), and real-world traffic patterns (which are deduced either from pilot experiments [41] or direct GPS driving data [42]).
The input variables used in most papers (whether they are estimated or recorded) are the daily distance travelled and travel time. In [42] daily travel distance and individual trip distances distributions were extracted from a survey conducted between 2012 and 2013 in Beijing with real-world GPS data collected on 112 volunteer vehicle owners. Likewise, a pilot experiment was put in place for a week in Germany [41] in order to record the evolution of daily trips through GPS data.
When such exact data are not available, researchers use travel surveys instead [43]. These datasets hold valuable general information on drivers behaviours and can be used in order to estimate parameters of statistical distributions for daily distance travelled or travel time. However, they can lack accuracy as the information is usually collected through questionnaires. For instance, in [39] the authors used the 2009 National Household Travel Survey (NHTS) as well as the New-York State Transportation Federation Traffic Data Viewer in order to extract traffic statistics such as EV speed travelling from one charging station to another. In [14] daily trips from a single real-world vehicle from the NHTS is randomly assigned to a fictional EV used in the model. This procedure is applied to the desired number of EVs to obtain a fictional EV traffic. Similarly, [30] used Barcelona's mobility patterns while [44] used the 2008 transportation data from the Dutch Ministry of Transportation in order to extract traffic statistics.

Weather
EV load models have stemmed from electrical load models. They have been developed over 100 plus years [45] and are comprised of some strongly established characteristics. One such characteristic is the thermosensitivity of electrical load [46]. In short, this means that some obvious patterns can be derived from analyzing both load demand and temperature. Thus, it is natural that the most frequently used input for EV load models is temperature and its traditionally associated statistics (e.g., average, maximum, minimum) ref. [12]. Even though temperature is used in most electrical load models it is rarely used in EV load models. Nevertheless, there exists reasonable arguments to include weather data in EV load models.
The influence of different weather variables on daily EV charging demand is explored in [47]. This includes, minimum, maximum and mean daily temperature as well as mean wind speed, maximum gust, rainfall, global radiation and sunny hours. The results of this study showed that temperature and specifically mean air temperature is the most correlated weather input to daily EV load relative to the others reaching a 27% correlation relationship in one of the regions considered.
Similarly, in [6] the authors argue that temperature can be used to model EV load as it is correlated to electricity prices and demand. However, there is no mention of other potential weather factors which could be included.
A relational analysis is used in [48] to assess the impact of weather factors on traffic volume in South Korea. It was found in this case study that maximum and average temperature as well as average humidity are the most influential weather factors on traffic volume. Average wind speed on the other hand is less influential and was discarded in their model.
Finally, it is also argued in [49] that temperature has a great impact on EV charging station load while wind and humidity were discarded.

Economy
Amongst the articles covered only a few include economical factors such as electricity prices [6,28,39], Gross Domestic Product (GDP) [30] or trends [40,47]. While some locations still provide free charging as an incentive to foster EV adoption, most public EVSEs have a charging price based off a subscription or peak/off-peak tariffs. However, China is one of the countries where real-time electricity pricing affects the price consumers pay at EVSEs. Thus, for [28] it is natural to include time of use tariffs as this study was made on EVs in China. In [39], the authors also include electricity prices as it can have an impact on the decision making undertaken by an EV driver when choosing which station to charge their vehicle.
Interestingly, GDP is included as a model input in [30] as it was shown in previous work [50][51][52] that GDP and other socio-economical variables such as place of residence and household characteristics have an impact on EV load and can be leveraged using an vehicle-centric approach. This is something worth exploring as these variables are easily accessible in travel surveys and general country statistics. They can be used to better anticipate charging behaviours in various locations of the grid. Global EV trend usage with uptake scenarios [40] or calculated trends [47] can also be used as model inputs.

Calendar
Temporal inputs are used in most model set-ups. They are easy to integrate and bring consistency as well as performance with the strong explanatory power they hold. They require no heavy statistical treatment as opposed to other variables (e.g., travel and battery) which makes them easy to use. For instance, in [8,31] day of the week and time of day are used in EV load models and more generally, EV load is derived in most research papers from day of the week, time of day and seasonal variation.

Open Data Search
Few review articles that deal with related topics to EV load modelling have included information regarding open data with associated references [53]. To the best of the authors' knowledge, there exists no article at the time of writing which has attempted this type of endeavour for EV load models. Indeed, a great majority of articles produced in the EV load modelling domain are based off simulated data or information owned by private entities which are very rarely made available [28,49]. This prevents reproducible work and slows down research in the field. Therefore, the objective is to fill this gap by providing the community with a structured and carefully selected list of open datasets ready to be used in order to foster data-driven research in the field. This open data search was possible in great part thanks to the open data inception initiative which gathers links to more than 3500 open data repositories on their website all across the world [54]. Links to the datasets are provided throughout this section and are up to date at the time of writing.

Research Criteria
This study focuses on datasets which give information on transactions between EVs and EVSEs. In other words, charging sessions.
Additionally, datasets holding information on exogenous variables such as traffic, travel surveys and air quality have also been considered. These variables are widely used in the domain in order to simulate travel behaviours especially when considering spatiotemporal models. Weather data are also used for EV load modelling and electrical load modelling in general [46]. However this type of information was excluded from this data research as global resources which provide high quality weather data already exist. For example, the riem package [55] written in R retrieves data from airport weather stations all over the world via the Iowa Environment Mesonet website. Alternatively, the National Oceanic and Atmospheric Administration (NOAA) also provides extensive weather data [56].
In terms of the perimeter of this research, the top 14 countries active in or associated with the EVI during the period covering 2018 to 2019 have been targeted. They are ranked by market share of electric cars according to the IEA [57]. This list includes, Norway, Iceland, Sweden, Netherlands, Finland, China, Portugal (as an observer), USA, Canada, France, New Zealand, United Kingdom, Germany and Japan [58].
Most of the repositories covered used native language, therefore, the use of direct query search was minimized as it can be approximate, especially in a foreign language. Thus the following standardized process was used for each repository covered: every time a categorical hierarchy was available, datasets under the following categories were searched for: "Environment", "Natural Resources", "Infrastructure", "Transportation", "Traffic", "Climate and Weather", "Urban Development", "Planning". If a category search was not enabled, then the following key words were used with their translated variants: "Travel (Survey)", "Electric Vehicle (or Car)", "Charge-Charging", "Traffic", "Station", "Air Quality", "Mobility".

Open Datasets
Overall, more than 860 repositories have been explored and more than 60 relevant datasets have been found that are directly (endogenous) or indirectly (exogenous) useful for modelling EV load. Table 1. summarizes the results found across all countries covered with the most relevant datasets in each category. Regarding EVSE data, a distinction is made between real-time and historical charging session data. Historical data gives information on charging sessions which occurred in the past. This is the essential type of data sought to model EV load. Real-time data refers to EVSE occupation information which is updated on short time frames (every few minutes) and not stored. It requires regular scraping to be transformed into a historical charging session dataset and only then can it be leveraged for EV load modelling. For each country the corresponding EV market share from the IEA [57] is provided as well as the estimated value of the number of EVs to which this market corresponds. The estimated number of EVs sold is calculated by using the number of passenger sales in 2019 given on [121] multiplied by the EV market share from the IEA [57]. In Figure 2, the national EV market share and estimated number of EVs sold are shown, coloured by the type of data available for each country. It is interesting to note that countries with the highest market share and number of EVs sold are not the ones for which historical charging session data was found. This demonstrates the existing gap between EV penetration in each country and the availability of open charging session data. First of all, countries for which historical charging session data was found will be discussed as it is the most relevant and rarest information to find. Then, the information available from countries without historical charging session data but with real-time charging session data will be outlined. Finally, the countries where only traffic information is available will be presented.

Figure 2.
For each of the 14 countries in scope, the national EV market share [57] and the estimated number of EVs sold [121] is shown: HCS refers to historical charging session data, RTCS refers to real-time charging session data and T refers to traffic counts and/or travel survey data.

Countries with Historical Charging Session Data Netherlands
6.6% national EV market share [57] equating to approximately 29, 000 EVs sold in 2019 [121]. 23 repositories were covered in the Netherlands with every type of relevant data found. First of all, ElaadNL [69] holds historical charging sessions which were studied in multiple papers [21,24]. With regards to traffic data, Onderweg in Netherlands is the national travel survey published on a yearly basis [71]. While its tables are quite hard to study as-is for nonnative speakers, they are summarized in another website in English [70]. Real-time data on utilization and consumption at public EVSEs installed in Rotterdam can be found on the EV-BOX website which is one of the EVSE providers [67]. Registered vehicles [72] and public EVSE locations are also available (e.g., in Eindhoven [68]). Additionally, historical traffic data from 2010 extracted from 24,000 measure points which stores information on vehicles such as speed and travel time [122] was found. USA 2.4% national EV market share [57] equating to approximately 130,000 EVs sold in 2019 [121].
The open data search for the USA was extensive. Around 370 repositories were covered in the analysis. Among them three relevant charging session datasets were found [82][83][84]. The first provides a continuous dump of session data from 2018 on EV sessions recorded at city-owned EVSEs in Boulder (Colorado) [82]. The second gives the same information for charging sessions of EVs in the city of Palo Alto (California) from 2011 to 2017 [83]. Finally, the third provides us with an aggregated monthly view of transactions in the city of Evanston (Illinois) between 2016 and 2017 [84]. Furthermore, a charging session open dataset from Caltech, which is continuously updated in collaboration with Power Flex, is available at [85] and an exploration of this dataset was produced in [123]. On top of these charging session datasets, EVSE locations are also available from the Alternative Fuels Data Center [81], as well as many travel surveys including the National Household Travel Survey (NHTS) [43], which are frequently used to simulate EV behaviours from conventional vehicles. In particular, a mobility survey was performed in April 2019 for the City of Boulder on 203 residents. Information extracted from [87] brought together with the EV charging session dataset of the city of Boulder [82] could lead to more consistent and accurate representation of EV load than by using the more general NHTS. Finally, a large proportion of states share traffic volumes in various municipalities across the country (e.g., the city of Houston [86]). France 2.1% national EV market share [57] equating to approximately 46,000 EVs sold in 2019 [121].
France (mainland) also has a large number of open data repositories. In total, 151 repositories were explored. Among them all kinds of relevant data were found. Firstly, charging sessions were recorded from April to May 2017 on Belib' stations in Paris [95]. Furthermore, the Paris Data website provides the Belib' real time availability of public EVSEs in Paris [97] which can be scraped on a regular basis via an API in order to reconstruct a historical dataset. Regarding private EVSEs, the charging sessions of a fleet of EVs owned by SAP Labs France have been recorded from June 2017 [96]. This dataset is updated every three months. On top of charging session data, registered vehicles across the territory [101], traffic counts in numerous cities [98], real-time traffic [99], and a national travel survey [100] are available in order to perform a spatiotemporal analysis of EV load. Different road traffic open data repositories are gathered on the Cerema website [124].
United Kingdom 2.1% national EV market share [57] equating to approximately 50,000 EVs sold in 2019 [121]. 72 repositories were covered for the UK mainland which yielded multiple charging session datasets. Two of them are situated in Scotland: Dundee City [106] and Perth and Kinross City Councils [107]. The former gathers two years of charging session data from 2017 to 2018 while the latter covers four years from 2016 to 2019 to the granularity of each session. Additionally, the UK government led an EVSE analysis over the year 2017, with domestics [125], and public [126,127] chargers. The raw datasets available include charging session data for each type of EVSE. There were also some initial trials led by the UK power networks in 2013-2014 which can yield useful information [128]. Public EVSE locations are available in numerous municipalities of the UK [105] as well as a national charging point registry [104] with real-time [108] and historical traffic counts [109]. Moreover, yearly national surveys are also available [110].

Countries with Real-Time but No Historical Charging Session Data Norway
46.4% national EV market share [57] equating to approximately 69,000 EVs sold in 2019 [121].
Norway is by far the country which has the highest penetration rate of EVs to date. Thus, it is no surprise that some highly relevant data for EV load modelling was found regardless of a relatively small number of repositories available (13). Norway was an early-mover in fostering EV adoption. In 2009, the first large investments were made by cities and the government with Oslo being the major contributor [2]. The most relevant data feed comes from the NOBIL database API [59]. This service provides (after benefiting of an API key from NOBIL) real-time information on EVSEs all across Norway, Sweden, Finland and Denmark (e.g., location, usage, details). Historical dumps do not seem to be available through the API, however a regular scraping may be put in place in agreement with NOBIL in order to reconstruct historical data. Other data sources which describes exogenous variables are available such as traffic volumes [60,61] and vehicle registrations by fuel types [62] which gives an overview of the trend in EV adoption. Sweden 7.9% national EV market share [57] equating to approximately 28,000 EVs sold in 2019 [121].
Sweden, with 18 repositories covered, also benefits from the NOBIL API which gathers real-time information on public EVSEs activity across the territory [59]. Some of NOBIL's information is gathered on an external Swedish website which provides historical statistics on EV public charge use [129]. On top of that data source, the map of public EVSEs [65] and statistics on newly registered vehicles per county, town and fuel type on a monthly basis are also available [66]. This latter dataset can be used in load forecasting models as a variable explaining the trend in EV usage particularly thanks to its monthly granularity. Finland 4.7% national EV market share [57] equating to approximately 5700 EVs sold in 2019 [121].
Finland is also one of the countries which has adopted the NOBIL database API [59]. Amongst the 20 repositories covered, exogenous information with traffic in real time in a few municipalities (e.g., the city of Tampere was found [130]) as well as registered vehicles between 1922 and 2019 [74] and average distance travelled by vehicles between 1980 and 2015 [73]. Even though these sources, provide us with extensive historical data, the most recent years are the most relevant for EV load models. These datasets can give an overall understanding of the overall traffic trends in Finland. Germany 2% national EV market share [57] equating to approximately 69,000 EVs sold in 2019 [121].
With regards to Germany, the most relevant datasets found among the 52 repositories covered were real-time public EVSE usage [113] and real-time traffic data [114] in the city of Bonn. Scraping both sources and associating these can lead to precise EV load models. In addition, travel surveys at fine levels of details are available from the German Mobility Panel [117] as well as the Rheinisch-Westfälisches Institute (RWI) [116]. The RWI dataset was used for a study on mobility patterns in [131]. Furthermore, the number of vehicles registered [118] and traffic counts in several municipalities [115] can give an understanding of the trend in EV usage across the country. Finally, as for most other countries, public EVSE locations are also available [112].

Countries with Traffic Data and No Charging Session Data Iceland
17.2% national EV market share [57] equating to approximately 3100 EVs sold in 2019 [121].
As for Iceland, 4 repositories were covered and the most relevant datasets found do not include any charging sessions but descriptive statistics on transports in Reykjavik [63] as well as vehicles distance and fuel consumption between 1995 and 2019 [64]. This can enable an understanding of the trends in EV adoption and high-level travel behaviours. However, limited analysis can be conducted as real charging session data are unavailable and would have to be simulated from other markets. Additionally, no real-world traffic data or travel survey was found which also limits spatial studies. China 4.5% national EV market share [57] equating to approximately 1,100,000 EVs sold in 2019 [121].
Being the country with the largest volume of EVs, China is at the forefront of EV deployment worldwide. However, this research did not result in finding any charging session data for China. One explanation for this is that more than 90% of EVSEs are owned by private firms [132]. Most of the articles which use data from charging stations on Chinese territory do not make it available as it is usually part of an agreement between the researcher and the entity owning the data. Nevertheless some relevant traffic data [76], ref. [77] for the whole territory was found and travel surveys [78] as well as EVSE locations [75] specifically in Hong-Kong. Portugal 3.9% national EV market share [57] equating to approximately 8900 EVs sold in 2019 [121].
Regarding Portugal's open data, traffic statistics with the number of vehicles registered by type and fuel was found [80]. Additionally, EVSE locations in Lisbon were also available [79]. No charging session data was found. Canada 2.3% national EV market share [57] equating to approximately 13,000 EVs sold in 2019 [121].
Being the co-lead of the EVI activities along with China [57], Canada is a major player in the field of EV deployment. Around 76 repositories were explored with numerous travel surveys which describe various aspects of drivers' behaviours [92]. Traffic volumes [91] and EV registrations [93] are also available with details on EVSEs available for public use in some municipalities (e.g., the city of Edmonton [89]). Even though no historical nor real-time charging session data was found, there exists an EV Home Charging Program [90] which gathers residential charging session data. However, this dataset is not open at the time of writing but might be accessed with the relevant access grants.
For New Zealand, 22 repositories were covered with successful findings in traffic statistics and vehicle registrations. Several locations in New Plymouth record traffic count [102] and the number of vehicles registered by type across the country is also available [103]. This data as-is is difficult to exploit for EV load modelling as it lacks EVSE locations and charging sessions. Japan 1.1% national EV market share [57] equating to approximately 48,000 EVs sold in 2019 [121].
Finally, with Japan, 14 repositories which did not contain any charging session or station location information were covered. Nevertheless, exogenous data can be extracted with numerous travel surveys [119] and some statistics on registered vehicles [120].

EV Load Models
The scope of this review focuses on papers detailing an EV load model as defined in Section 2. Most often, the model output is the power or energy demand at EVSEs but it can also be closely related features (e.g., EVs arrival/departure times, charging durations) from which the load can be reconstructed. The papers presented in this review were selected from literature search engines with the following keywords: "Electric", "Vehicle", "Load", "Model (or Modelling)". In particular, the focus was given on presenting a wide variety of methods to encompass multiple modelling settings.
The purpose of this section is to enable an understanding of the strengths and weaknessess of the methodologies proposed to model EV load. From the papers considered for this review, EV load models can be segmented into three categories: statistical characterization, stochastic processes and machine learning models. The comprehensive list of models considered in this review is presented in Appendix A, Table A1.

Statistical Characterization
The goal of statistical characterization models is to produce a distributional analysis for the outputs shall it be data-driven [24] or entirely deduced from exogenous variables such as travel data and statistical assumptions [14]. The different characterizations of EV load and proxy variables such as charging duration or inter-arrival time are summarized in Table 2.

Gaussian
Particularly suited for large simulations Unrealistic as negative values have a non-zero probability [14] Weibull, Lognormal, Exponential Rapid implementation while providing an approximation consistent with reality Fail to capture significantly diverse behaviours in the data [13] Mixtures (e.g., Beta, Gaussian) Captures significantly different users' behaviours in the data and respects realworld constraints Unsuitable for medium or large dimension problems with numerous covariates [24,123,133,134] KDE Highly versatile model as no explicit prior on the distribution is required Weak interpretability power in addition to a sensitivity to outliers [135][136][137][138][139] In [14], the authors did not benefit from any EVSE data. Nevertheless, they used the NHTS [43] ICEV behaviours from 2009 to derive EV travel patterns in order to simulate an EV fleet and characterize their behaviours. In their work, the simulation showed that the power consumption can be seen as a normal distribution without any loss of accuracy. This can be true in practice, however, it is usually more consistent to assign distributions which are defined on R + as it is unrealistic to observe negative power demand in that context. It is however convenient for model conciseness and computational speed.
In [13] a statistical analysis is conducted on data extracted from an EV trial conducted in Victoria (Australia) on 33 EVs on a 3-month period. This article showed that the Weibull distribution was the best fit for charging duration compared to the exponential and lognormal laws. They have also characterized the time to the next charging event as a mixture of two lognormal distributions. This is a vehicle-centric approach which considers the time to next charge from the EV perspective. These characterizations were used on a Monte-Carlo simulation which created 4000 EVs by random sampling and assessed their overall impact on the grid. While these distributions are more consistent than a Gaussian distribution, they still fail to capture the irregularity of EV drivers' behaviours hidden in the data.
In [24], a dataset provided by Elaad NL [69] has been studied. This paper characterizes EV load through a mixture of beta distributions. Its parameters are optimized by minimizing the Root Mean Squared Error (RMSE) of the point-wise difference with the empirical distribution. Additionally, Kolmogorov-Smirnov testing was used to assess the goodness-of-fit. From the observations that weekly charging sessions present two peaks (namely a morning and a late afternoon peak) it was reasonable to consider a mixture of distributions to account for the different modes. In [133], 13 different charging session profiles were identified using Gaussian mixture clustering based on data provided by the G4 cities of the Netherlands. Other recent studies complement this work by using Gaussian mixtures to model the triplet (arrival time, charging duration, energy consumed) in order to characterize EV load. In [123] the triplet is modelled by a multivariate Gaussian mixture while in [134] only the couple (charging duration, energy consumed) is modelled by a Gaussian mixture with the arrival time modelled by an exponential distribution. The results produced are more accurate than for elementary distributions. However, they are structurally limited to the joint use of few covariates which keeps from fully integrating exogenous information.
A few articles also modelled EV load with a kernel density estimator (KDE). Two main types have been used in the literature: the Gaussian kernel density estimator (GKDE) and the diffusion kernel density estimator (DKDE). These methods are highly versatile because no prior knowledge over the distribution is hypothesized. Thus, they can reach high accuracy when fitting empirical data at the cost of weak interpretability. Looking at [135], a GKDE is used to estimate daily trip distance and end time of the last trip. Both variables are critical for EV charging schedules and this method improves the accuracy of the distributions compared to parametric methods. A similar conclusion is drawn in [136] from a GKDE estimating the triplet (arrival time, charging duration, charging capacity). In [137,138] the authors have compared both the GKDE and DKDE when estimating EV load. Thanks to its optimal bandwidth selection process, DKDE was found to produce better load estimations. Finally, in order to make the best of both GKDE (which is less sensitive to outliers) and DKDE (which has a higher overall accuracy), [139] has proposed a hybrid density estimator (HKDE). This HKDE reached significantly better root-mean square performance in estimating the EV load than the DKDE and GKDE on their own on the dataset used for this study.

Stochastic Processes
In the context of EV load models, three main types of stochastic processes have been detailed in the literature: purely temporal, spatiotemporal and queuing theory viewpoints. The various stochastic processes presented are summarized in Table 3. One of the early works on EV temporal load models was completed in [15] where the authors explored the stochastic nature of EV load by using probabilistic travel patterns to determine initial SoC and starting time of battery charge. In particular, assuming battery type is known, recharge starting time is then assumed to be a random variable with a probability density function (pdf) determined by the tariff structure (scenarios) and patterns of EV usage. Initial SoC is also considered as a random variable dependent on the total distance travelled since last charge. Introducing a lognormal pdf for the daily distance driven, the initial SoC can be derived assuming a linear discharge (also assuming that it was fully recharged originally). Finally, they obtain a discretized version of the stochastic process of the load on half hourly intervals for a single EV which is then extended to an arbitrary number of EVs.
In [140], the authors defined a temporal stochastic process modelling charging patterns at a public EVSE with a Markov Chain comprising three states: unoccupied, charging and plugged-in but not charging. Essentially, the Markov Chains setup assumes that the current state of the process, conditionally to all past states, only depends on the previous state. It simplifies the calculation and has been extensively studied in the literature through many applications [143]. In [140], after initializing the transition probability matrix which drives the path of the process they let the system evolve and assess the revenue made by the charging station.
Auto-regressive integrated moving averages (ARIMA) are a particular type of temporal process. Box and Jenkins [144] formalized a precise methodology to estimate the different orders of ARIMA processes. In [37] the ARIMA process is quantized on hours of the day. In other words, 24 sub-processes are estimated in their model. The final process obtained is thus a day-ahead hourly forecaster of EV load. In a following paper [141], they improved the performance of their model by forecasting separately conventional load and EV load. The results obtained in this paper reinforces the argument that EV load is structurally different from conventional load and requires specific load forecasting models.
Similarly, in [44] the authors modelled household EV load demand by using stochastic behaviours of three random variables: start-time of trip,end-time of trip and travelled distance. With a vehicle-centric approach, they present a Monte Carlo simulation method to derive overall system load. A particularity of this model is that it used a copula to characterize the multivariate distribution function of model variables. Then, using typical EV charging profiles, they derived the electricity demand at different EV uptake levels while observing the grid impacts.
Purely temporal models are particularly suited for one EVSE or one EV. They are not consistent for modelling cluster of EVSEs which require spatial considerations.

Spatiotemporal
Spatiotemporal models are usually designed for disaggregated approaches. The EV load at different stations is modelled separately using temporal features as well as travel patterns. They are rare in the literature as they require the combination or simulation of both the charging sessions and EV trips. Furthermore, they are limited as they cannot scale to large geographical scopes. Nevertheless, they can explore in fine details the intricacy of the relationship between EVs and EVSEs in specific regions.
In [9], the authors introduced a spatiotemporal model using Monte-Carlo simulation to specifically assess EV load demand in urban areas. The core of this method lies in the origin-destination analysis used to determine daily travel patterns of EVs. Additionally, probability functions to describe EV characteristics were identified. Using both travel patterns and EV characteristics, they ran a Monte-Carlo estimation of EV charging load for each busbar. By construction, this model can also be used for probabilistic assessment which indicates the branches most vulnerable to potential overloading.
In [18], the authors modelled both temporal and spatial stochastic aspects of PHEV owners behaviours to then derive their pdf. They modelled the temporal dimension with a uniform distribution for the start and end of charging time. As for the spatial dimension, they described the number of PHEVs arriving at an EVSE by a Poisson process according to driving behavior and traffic state. Assuming that both dimensions are independent, they derived the joint spatiotemporal pdf by multiplying both individual pdfs for charging times and arrival at EVSE. Ultimately, they expressed the effect on the daily load curve under various number of PHEVs for 150 PHEVs dispersed in the test system.
Finally, [32] proposed another probabilistic approach to characterize the spatiotemporal diversity of EV charging demand specifically on peak load demand. A Monte-Carlo simulation was used to evaluate the impacts of charging demand on the grid in urban and rural environments. It showed that this diversity of location helped the grid handle the demand better.

Queuing Theory
Queuing theory models often use Kendall's shorthand notation which describes the arrival (A), the serving time (B) and the number of servers (C) in a compact form: A/B/C. EV load models are a suitable context for this theory as it was detailed in numerous articles [16,29,38,39,142].
One of the early works on EV load modelling was performed in [29]. This simple theoretical approach proposed to use an M/M/n max queue where the two first components characterizes the Poisson processes for the number of EVs arriving at a public EVSE and the number of EVs served while n max refers to the number of maximum parallel charging EVs at charging points. A case study was conducted on the first car produced by Tesla, the Roadster Model, in order to assess the stochastic power demand output from the model. The same queuing model was also used in [38] and was compared to a Monte-Carlo simulation in order to ultimately fit a distribution for the entire load demand of PHEVs. Additionally, in [142] the authors also used this queuing model and complemented it with a fluid traffic model in order to look at EV charging load on highway charging stations.
In a more general fashion, the authors of [16] have opted for an M t /GI t /∞ queue where the number of arrivals follows an inhomogeneous Poisson process (indicating that the intensity function varies over time), the serving time is a general time-dependent distribution with an infinite amount of servers or EVSEs in the EV load context. Using some established results of queuing theory and previous work on estimating non-homogenous Poisson process rates, the authors managed to forecast each disaggregated intensity function for day-ahead forecasting. This paper is the only one found for stochastic processes applied to EV load which uses both travel patterns from the NHTS and real charging session data.
Thus, one important advantage of queuing network analysis applied in a spatiotemporal context of EV load is that it can capture interactions among multiple charging stations. In that sense, BCMP networks (named after their inventors: Baskett, Chandy, Muntz and Palacios) introduced in [145] were applied in [39] to produce an EV load model. BCMP networks are a type of queuing network which yield a product-form stationary distribution. This kind of network is commonly used to study interconnected queues. In the EV load context, it means that it enables the model to take into account the potential shift of users from one station to another and control it to envisage different scenarios.
It is clear that queuing models are to be reserved for theoretical considerations rather than for operational implementation. Nevertheless, thanks to their solid mathematical foundations, they bring great insights for understanding EV load behaviours especially when EVSE data are scarce.

Machine Learning
Four machine learning branches have mainly been explored for modelling EV load: Linear Model (LM), Support Vector Machine (SVM), Random Forest (RF) and Artificial Neural Network (ANN). In [31] the authors compare decision trees/tables, SVM and ANN. SVM demonstrated the best performance while the ANN and decision trees are 10 times quicker to test on new data. A limitation of this work is that the dataset does not come from real charging session data but an aggregated distributional analysis produced by ECOtotality [146]. In [147], SVM, RF, k-Nearest Neighbours (k-NN) and a method called Modified Pattern-based Sequence Forecasting (MPSF) which uses k-means are compared. They found that SVMs and RF reach the best performance with regards to the Mean Absolute Error while k-NN and MPSF achieve better performance with regards to the Symmetric Mean Absolute Percentage Error. Since, k-NN and MPSF are much faster to compute predictions, they concluded that MPSF and k-NN were better suited for operational use. The different machine learning branches studied for EV load models are gathered in Table 4.  [8,31,33,147] Random Forest Versatile model with no prior assumptions on the shape of the data Weak interpretability with no ability to extrapolate from training data [12,33,147,151] Neural Networks Can reach the highest level of performance Architecture selection process can be laborious with long training time [28,31,33,41,49,152]

Linear Model
It is common practice to start addressing a machine learning problem with simple models such as LM. In [148,149] LM was chosen as a first step to implement a smart charging strategy. This gives a more realistic operational context as opposed to other articles which skip predictive models before implementing an optimal charging strategy. Furthermore, in [150] an assessment of model inputs is presented using LM. They found that the voltage level of each EV had a critical influence over their model. However, these models are limited as they cannot capture irregular patterns in the data which is expected across EV drivers.

Support Vector Machines
SVM were originally defined by Vapnik [153]. In a nutshell, the idea behind this algorithm is to find the hyperplane which maximizes the margin between different sets of populations. It is easy to implement but yields relatively long training times when working with large datasets.
In the context of EV load, SVM were compared to a Monte-Carlo forecasting technique in [8] and showed a better performance on a theoretical charging session dataset. Additionally, in [33] SVMs are used alongside other machine learning algorithms in order to model EV load from a vehicle-centric as well as from an EVSE-centric perspective. Because the EVSE-centric approach requires more data, it demonstrates a significantly longer running time as expected for SVMs. This study used a dataset extracted from UCLA campus parking lots. Thus, it is unlikely that these kinds of models will scale adequately for a larger scope of charging stations. Furthermore, articles using SVMs are now becoming rare as other alternatives with similar or better performances can be found.

Random Forests
RF is a learning algorithm which was popularized by Leo Breiman [154]. In short, it is an ensemble method which uses decision trees as elementary components for its construction.
On top of SVMs, [33] also used RF to model EV load. The few hyperparameters required to be tuned (e.g., number of trees, sampling rates) enables a fast and easy implementation with the possibility to iterate rapidly. In [12], RF demonstrated their ability to forecast day-ahead EV load charging blocks for households in an EVSE-centric fashion. As mentioned in previous paragraphs, the EVSE-centric approach can be difficult to implement as it requires large amounts of data and complex modelling. Thus, the use of RF for this kind of disaggregated approaches is adequate. [151] precisely illustrates the ability of RF to handle the EV load problem from both time and spatial dimensions. This article shows that RF can model both a single station as well as a group of stations considering spatial and temporal inputs. Single station models are more accurate as they have more consistent behaviours while the group of station models is slightly less accurate in terms of the mean absolute percentage error but brings a more holistic view to the problem. Other ensemble methods which stemmed from the same area of machine learning such as gradient boosting could also be considered [149,155].

Neural Networks
ANNs were initially presented by Frank Rosenblatt in 1958 [156] in their most elementary form in the name of the perceptron. They were extended shortly afterwards to the Multi-Layer Perceptron (MLP). After being forgotten for a few decades, ANNs have experienced a rebound in interest from the end of the 80s in particular with the formulation of the backpropagation algorithm [157] and the breakthroughs in computer vision with Convolutional Neural Networks (CNN) [158].
In [41] an MLP with tilted loss function is used for probabilistic forecasting of EV load. It is compared with a kernel density estimator as well as quantile regression and it showed the best performance using the same inputs and outputs. It is quite common across various scientific fields that ANNs reach the highest performance on many problems compared to other machine learning or statistical methods. The main drawback is the lack of interpretability of such models which are highly complex [159,160]. Elaborate ANN architectures such as CNNs [49] and Recurrent Neural Networks (RNNs) [28] have been explored for modelling EV load. [152] compares 12 different architectures including CNNs, RNNs. In this last article, the Long Short-Term Memory (LSTM) architecture showed the best performance on the dataset studied. From all these articles it is challenging to decide which ANN architecture is the best overall for EV load modelling. However, some clear conclusions can be made. RNNs are particularly performant as they take into account historical EV load values. In the current operational context, this is information that is hard to obtain at fine time steps. Thus, until real-time communication channels are available, it is likely that the most useful ANN models will be CNNs or RNNs with larger timesteps.

Matching EV Load Models to Open Datasets: A Preliminary Study
Six datasets dealing with historical charging session information have been selected for their completeness and accessibility: Boulder [82], Palo Alto [83], Dundee, [106], Perth, [107], Paris [95] and Domestics UK [125]. According to this research, none of these datasets were used in the EV load modelling literature so far. The purpose of this section is to identify the variables available and to enable a high-level understanding of charging behaviours. In addition, an association of the six datasets selected with the models reviewed in Section 4 is proposed.

Variables and Data Quality
The fields available in each of the six datasets selected are summarized in Figure 3. These six datasets provide us with session start and end times as well as the energy consumed. With the exception of the Domestics UK dataset, the station address (location), and the power level of the charging port are available in these datasets. In addition, the Palo Alto and Boulder datasets contain gasoline and greenhouse gases (GHGs) savings as well as the charge duration which represents the amount of time the vehicle was plugged-in and actively charging. This is different to the park duration which also captures the time a vehicle was plugged-in and no longer charging which is a variable only given in the Palo Alto dataset. However, this park duration can be deduced from the session start and end times in the remaining datasets. Finally, for customer specific information, the Paris data provides a unique identifier per customer badge and the Palo Alto dataset gives the post code registered by the driver. Information regarding postcodes is interesting for models that include travel inputs such as the distance between the driver's home and stations nearby. Additionally, a data quality analysis was conducted on the six datasets. In [152], outliers were identified by using a set threshold from the variability between current and previous values. Instead, in this analysis, fixed boundaries were chosen and the following set thresholds were observed:

•
Charge and/or Park Duration has to be positive and less than 24 h • Energy Consumption needs to be positive and less than 100 kWh The first criterion is important as some datasets have some obvious errors in the end times column which are set in 1970. This might indicate a manipulation error from the customer which led to a computational mistake along the process of data collection. Additionally, as most charging sessions last for a few hours, charging sessions that lasted for more than a day were discarded.
Similarly, recorded energy consumption values for Perth [107] and Dundee [106] are highly variable, reaching anomalously negative and highly positive values indicative of potential errors. The 100 kWh upper bound was chosen as it is close to the highest capacity of the Tesla Model S which is the EV with the largest battery capacity amongst the most widespread models [161].
If a transaction does not fit these criteria, it is discarded from the following analysis. This preparation had very little impact on the Palo Alto dataset with only 0.17% of transactions discarded, while the Boulder, Dundee, Perth, Paris and Domestics UK datasets have seen 8%, 11%, 4% , 14% and 7% of their data discarded respectively. Figure 4 shows the trend in the total number of transactions per day over each dataset specific time frame. Due to increasing EV uptake [1], an increase in EV charging sessions is expected as illustrated by Palo Alto and Perth. However, this is not the case for Boulder and Dundee. Instead, a decreasing number of charging sessions at the end of each time series can be observed. This could be due to external factors such as an increase in charging session prices. As for the Domestics UK, only 1 year of data are available which indicates that the plot shown describes the yearly cycle rather than the long-term trend. Similarly, the Paris data cannot be extrapolated as it only represents two months of data. Overall statistics of the six datasets in scope are provided in Table 5. The dataset which covers the largest time frame is from the city of Palo Alto with 6 years, followed by Perth with 4 years, Boulder with a little over 2 years, Dundee with 2 years, Domestics UK with 1 years and Paris with 2 months. In terms of the transactions (or sessions), Domestics UK records the largest number of transactions. Moreover, Dundee and Boulder both cover 2 years of data but Dundee has close to three times more transactions. Naturally, there are consistently more transactions on weekdays than on weekends in total and on average across all datasets. Furthermore, the average park duration and charge duration varies significantly across the datasets. Indeed, while for Palo Alto the average park duration is around 2 h and 40 min, in Perth it is closer to 1 h and 15 min, so less than half of the time. Additionally, the average Park Duration for Domestics UK is greater than 9 h which is expected as this dataset describes residential charge instead of public charge for the others. The Charge duration on the other hand is relatively close for Boulder and Palo Alto which are both American cities (located in Colorado and California). Finally, the average energy demand is consistently between 8 to 11 kWh across all datasets.

Suggested Matching of EV Load Models with the Datasets Considered
From this exploratory analysis, some suggestions can be given on how to match the EV load models reviewed in Section 4 with the six datasets presented above. These suggestions are summarised in Table 6. The first criterion identified for this selection is whether the dataset describes public charging or residential charging. Only two of the models reviewed deal with residential charging and will thus be assigned to the Domestics UK dataset. They would benefit from the large number of records and the customer identifier provided in this dataset. Thus, it is a good setup for vehicle-centric approaches [44] and machine learning models [12].
Looking now at public charging sessions, Boulder and Palo Alto are the only two datasets which gathered GHG and gasoline savings. These fields are rather uncommon across charging session datasets and they can enable an environmental impact analysis of EVs. However, it would be limited to EV usage rather than a holistic environmental impact with lifecyle assessments [163]. Thus, no mention of this kind of analysis was found in the articles reviewed.
Nevertheless, Palo Alto also possesses the driver's post code. With this knowledge, fine spatiotemporal processes can be derived as proposed in [32]. Additionally, the large amount of records available is suited to test the scalability of queuing models [16,39] and spatiotemporal processes [9,18] which require travel information. It also provides an ideal setup for deep learning models which require large training sets [28,33,41,49,152].
As discussed in Section 3, Boulder not only holds a charging session dataset but also a travel survey led in 2018 with a focus on EVs [87]. It is a rather qualitative survey and can be used in combination with the NHTS [43] to address the more specific behaviours inherent to the city of Boulder. With both travel and charging session data, this is also a favourable setup to apply spatiotemporal models [9,18,32]. Considering that this dataset is continuously updated and holds recent data, it would also be interesting to apply models which were precisely built for operational use such as [147,151] in the hope of taking consistent conclusions with real-world applications.
The Paris dataset holds customer identifier information which encourages vehiclecentric approaches. However, the small amount of data deters the use of models which leverage numerous parameters. Instead, statistical characterization techniques with unimodal distributions could yield a sufficient approximation of the phenomenon as proposed in [13,14] along with LM [148][149][150]. The remaining statistical characterization models (mixtures [13,24,123,133,134] and KDEs [135][136][137][138][139]) can capture diverse patterns and thus could be applied to medium-sized datasets. The Paris dataset could also be used to ver-ify the consistency of simple queuing models as they usually struggle to find concrete applications [29,142].
As Perth and Dundee are two neighbouring cities of the UK, it would be interesting to compare the difference in charging behaviours between them. Considering that they have the same fields, it would be interesting to independently compare their behaviours as despite their closeness, it is unlikely that there is a significant spatial impact between these for public charging. Thus, temporal processes produced in [15,37,140,141] would be well suited. Additionally, thanks to the medium-size of both datasets, SVM models described in [8,33] could also be a good option here.

Discussion and Future Work
The purpose of this section is to highlight and discuss the current gaps and limitations from both open data and EV load models perspectives (Table 7). Table 7. Gaps and future work for EV load data and models. Sections 6.1 and 6.2 deal with data prospects and limitations while Sections 6.3 and 6.4 describe new ways of modelling and tie EV load models to optimization of charging schedules.

Section
Keywords References While the open data search provides visibility on charging session and traffic data, no repositories merging both was found. Thus, it is likely that the standard will remain to manipulate separate datasets for charging and traffic data as it is already the case in the literature [39,41,48] at least in the near future. As such, different locations and different grains of data will still need to be leveraged in order to perform a complete data-driven spatiotemporal description of EV load.
Battery inputs are intrinsically complex to obtain. It would involve establishing an Internet of Things (IOT) between EVs, charging stations and controllers when considering a smart charging scenario [26,27]. This type of work is currently in progress [41] and the community could benefit from new types of information for EV load models in the near future. However, so far, the articles which include these variables simulate them from prior statistical distributions.
Some of the datasets presented in Section 3 provide unique identifiers for vehicles and even driver's registered post codes [83]. However, data regulations (e.g., GDPR in Europe [164]) may prevent spreading battery and travel inputs openly. Thus, more elaborate and complex models will be required in order to capture hidden information for disaggregated approaches.
The other variables of interest pinpointed in this review are easier to retrieve. For example, weather information can be obtained from the R package riem for a wide range of locations [55] or on the NOAA website [56]. If finer information is required, meteorological grid models can be used for that purpose. Economical and calendar variables can be tailored for each analysis depending on the grain chosen.

Other Types of Relevant Data
The open data search presented in Section 3 mainly focuses on charging session and traffic data. However, it is also possible to consider general electrical load open data [53]. Indeed, if a region switches to EVs in a given time period, this change can reflect on the regional load curve. In this context, EV load would be a latent or hidden variable contributing to the general electrical load.
In addition, grid networks data [173] and big cities' electrical load data [165] are becoming more and more available. Combining them with charging session and traffic data may lead to models which have a holistic and data-driven understanding of the reality.
To model the load of specific appliances with general electrical load data, Non-Intrusive Load Monitoring (NILM) methodologies have received a lot of interest in the related literature [174,175]. The question addressed is whether it is possible to identify and characterize EV load within a general electrical load curve [166,167].
Synthetic data can also be used in order to produce EV load models. In most research papers, simulators rely heavily on assumptions derived from travel surveys and not so much on real charging session data [14,15,140]. Nevertheless, a data-driven simulator has been recently proposed in [134] which was trained on real-world charging sessions and thus can represent more accurately real world charging behaviours.
Finally, there are semi-open or closed data. Most of these closed datasets are related to residential load [90,176] as it is less feasible to retrieve them without raising data privacy concerns.

Composite Approaches
From a methodology perspective, it is interesting to note that very few stochastic processes approaches used real data [16]. These models are usually theoretical and can be useful for mid-term or long-term scenarios but less relevant for short-term forecasting. Alternatively, the machine learning and statistical characterization approaches presented were highly data-driven.
In the corpus of articles considered in this review, there exists no article that deals both with stochastic processes and machine learning algorithms in the context of EV load models. Thus, it would be interesting to compare them in terms of performance but also to assess what they can bring to each other in a composite model [168][169][170].
Furthermore, it was shown in many articles reviewed in Section 4 that the influx of vehicles at EVSEs is highly time dependent. Consequently, homogenous poisson processes used in articles from the corpus will not be enough to capture the reality of drivers' behaviours [18]. More elaborate processes such as inhomegenous poisson or self-exciting point processes [177] have to be considered to account for this time dependence. Using these stochastic processes hand in hand with machine learning algorithms will foster consistency, conciseness and performance of EV load models.
Finally, another gap brought to light in this review is the lack of work on stacking models or bottom-up approaches [12] which are indeed more costly from a computational perspective but can bring a deeper understanding of EV load.

Link with Optimization
As mentioned in the introduction, EV load models are part of a two-step process. Firstly, behaviours relating to EV load demand must be understood and then current schedules optimized depending on the aim (e.g., load flattening or load balancing). The articles introducing methodologies for optimizing charging schedules usually assume a clear knowledge of the future short-term demand. It is less common to see articles which account for the potential uncertainty of EV load models. This is also due to the fact that there has been less focus given to probabilistic EV load models which could yield confidence intervals for evaluating risks of surpassing the energy supply at a given time. Additionally, probabilistic forecasting proposes a more exhaustive representation of the demand as it does not solely focus on the mean demand.
Solutions which include both forecasting and optimization aspects in the same model or process are required [6,178]. Again, using the same data for this purpose is essential, as it enables the development of solutions by researchers specialized in different fields such as forecasting and optimization. To unify both, methodologies can be also developed using reinforcement learning [172]. In addition, specific losses related to the exploitation of probabilistic forecasts in smart charging strategies could be relevant [171].

Conclusions
In this paper, the reader is provided with a comprehensive list of open data that can be used to model EV load. Additionally, an organized review of EV load models is presented. Finally, six datasets are explored to provide recommendations on how they can be matched to the EV load models reviewed. The open data search focused on the top 14 countries of the EVI ranked by national EV market share. A total of 860+ open data repositories was covered which yielded more than 60 open datasets relevant for modelling EV load.
Across the literature, a wide spectrum of EV load models were reviewed. This includes statistical characterization models from parametric (unimodal distributions and mixtures) to non-parametrical estimation (KDE). Furthermore, stochastic processes with purely temporal models, spatiotemporal models and queues were also included. Finally, machine learning models including LM, SVM, RF and ANN were reviewed. From the open data research, six datasets which have not been previously studied in the literature were considered. Recommendations were provided on how the models reviewed could be matched to each dataset.
Points to consider for future work involving EV associated data include: the regulatory landscape, accessibility and the ability to merge data sources (e.g., travel and charge). This will involve bridging gaps between various stakeholders to establish a comprehensive IOT. As for future work in EV load modelling, composite approaches and links with optimization are to be explored further.
We hope that this article will encourage the use of the open datasets and models reviewed in order to foster reproducible work and breakthroughs in the field of EV load modelling.  Acknowledgments: The authors would like to acknowledge the contribution of Benjamin Mousseau and Emma Chieusse-Gérard from EDF Energy who brought to our knowledge the existence of the Domestics UK dataset as well as Emma Yule from Edinburgh University who kindly proofread the article.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  Table A1. Exhaustive list of models presented in Section 4 with the input dataset(s) used (if applicable), the approach taken (Aggregated, Vehicle-centric or EVSE-centric) and the output variable(s) modelled. For the input datasets, the data repositories or data reports are provided as references when they were clearly made available by the authors of the paper.