Earthquake Insurance in California, USA: What Does Community-Generated Big Data Reveal to Us?

: California has a high seismic hazard, as many historical and recent earthquakes remind us. To deal with potential future damaging earthquakes, a voluntary insurance system for residential properties is in force in the state. However, the insurance penetration rate is quite low. Bearing this in mind, the aim of this article is to ascertain whether Big Data can provide policymakers and stakeholders with useful information in view of future action plans on earthquake coverage. Therefore, we extracted and analyzed the online search interest in earthquake insurance over time (2004–2021) through Google Trends (GT), a website that explores the popularity of top search queries in Google Search across various regions and languages. We found that (1) the triggering of online searches stems primarily from the occurrence of earthquakes in California and neighboring areas as well as oversea regions, thus suggesting that the interest of users was guided by both direct and vicarious earthquake experiences. However, other natural hazards also come to people’s notice; (2) the length of the higher level of online attention spans from one day to one week, depending on the magnitude of the earthquakes, the place where they occur, the temporal proximity of other natural hazards, and so on; (3) users interested in earthquake insurance are also attentive to knowing the features of the policies, among which are ﬁrst the price of coverage, and then their worth and practical beneﬁts; (4) online interest in the time span analyzed ﬁts fairly well with the real insurance policy underwritings recorded over the years. Based on the research outcomes, we can propose the establishment of an observatory to monitor the online behavior that is suitable for supporting well-timed and geographically targeted information and communication action plans.


Introduction
California is one of the most earthquake-prone places on earth [1]. Its high seismic activity is caused by the San Andreas Fault system (SAFS), a complex of faults that show predominantly large-scale strike slip, which crosses almost the entire state from north to south. The SAFS consists mainly of the San Andreas Fault and several major branches [2,3].
To learn more about the SAFS, researchers have performed a stream of studies, thus raising knowledge on the seismic hazard and risk in the area [4]. However, the SAFS has caused many strong and damaging earthquakes in both historical and recent times.
Toppozada and Branum [5] listed 35 significant earthquakes in California for the 19th and 20th centuries. We mention the earthquakes of 9 January 1857 Fort Tejon (M 7.9), 21 October 1868 Hayward Fault (M 7.0), and 18 April 1906 San Francisco (M 7.8) [5]. The last event was one of California's most famous earthquakes, which caused directly or indirectly over 3000 deaths in San Francisco and USD 400,000,000 in 1906 dollars from the earthquake and earthquake-triggered fires [6,7]. More recently, other earthquakes have occurred, such as the 18 October 1989 Loma Prieta (M 6.9), 17 January 1994 Reseda Boulevard, Northridge (M 6.7), and 6 July 2019 Ridgecrest (M 7.1) [8].
Big Data Cogn. Comput. 2022, 6, 60 3 of 21 affected by an earthquake), (3) vicarious experience (i.e., experience of disasters described by relatives/friends or media accounts of domestic and international disasters), and (4) life experience, not related to disaster experiences (i.e., unfavorable life experiences, such as accidents at work, illnesses, and car accidents). Regarding the role of these experiences, the literature recognizes direct experience as the most effective in encouraging preparation. However, both indirect and life experiences, even if less effective than first-hand involvement, also contribute to influencing preparedness [32,33]. Moreover, vicarious experience is recognized as a significant predisposing element [29].
The article is organized into other three main sections. Section 2 addresses the materials and methodology used to both retrieve and analyze the data. Section 3 includes the exploration of the results and their cross-correlated discussion. Section 4 draws conclusions with both the practical implications and limitations of the research in mind.

Google Trends: An Overview
Scholars increasingly consider Google Trends (GT) in their research. The analysis performed by the authors of this article in the Scopus citation index identified 1769 documents published from 2007 to 2021, with a maximum (501 documents) in 2021 ( Figure 1). Approximately 30 were the subject areas involved, the first three represented by medicine (25% of the total), followed by computer science (12%), social sciences and economics (10%). GT was used for real-time outbreak surveillance [34], healthcare [35], collective human behavior in financial markets [36], and tourism demand forecasting [37]. More recently, GT was also adopted to reveal dynamic patterns of community interest on floods [38], earthquakes [39], and earthquake insurance [40]. experience (i.e., first hand involvement in suffering losses and direct physical perc of a hazard, such as an earthquake), (2) indirect experience (i.e., persons not d involved in the hazard and who did not suffer any consequences, such as te volunteers travelling to an area affected by an earthquake), (3) vicarious experien experience of disasters described by relatives/friends or media accounts of domes international disasters), and (4) life experience, not related to disaster experienc unfavorable life experiences, such as accidents at work, illnesses, and car acci Regarding the role of these experiences, the literature recognizes direct experience most effective in encouraging preparation. However, both indirect and life exper even if less effective than first-hand involvement, also contribute to influ preparedness [32,33]. Moreover, vicarious experience is recognized as a sign predisposing element [29].
The article is organized into other three main sections. Section 2 address materials and methodology used to both retrieve and analyze the data. Section 3 in the exploration of the results and their cross-correlated discussion. Section 4 conclusions with both the practical implications and limitations of the research in

Google Trends: An Overview
Scholars increasingly consider Google Trends (GT) in their research. The a performed by the authors of this article in the Scopus citation index identifie documents published from 2007 to 2021, with a maximum (501 documents) i ( Figure 1). Approximately 30 were the subject areas involved, the first three repre by medicine (25% of the total), followed by computer science (12%), social scienc economics (10%). GT was used for real-time outbreak surveillance [34], healthca collective human behavior in financial markets [36], and tourism demand forecastin More recently, GT was also adopted to reveal dynamic patterns of community inte floods [38], earthquakes [39], and earthquake insurance [40]. GT gives access to an almost unfiltered sample of real search requests made Google search engine. However, the advantages and limits of GT need to be emph [41]. GT gives access to an almost unfiltered sample of real search requests made to the Google search engine. However, the advantages and limits of GT need to be emphasized [41].
According to Google Trends [42], the data sample is completely anonymized (no person is identified), ranked (the subject of the search queries is established) and combined (grouped). This allows us to show interest in a certain topic around the world or in specific cities. GT includes two search strategies: search term and search topic. On the one hand, search terms show matches of all terms in the query in a given language; on the other hand, topics are a set of words that share the same concept in any language. GT also offers interest search activities considering different geographic areas (world, state, metropolitan areas, and cities). In this way, GT includes a GeoMap showing areas where the term searched is popular during the corresponding selected period. In that map, darker shades show in which places the term has a higher probability of being searched by users. There are two samples of GT data that users can access: (1) real-time data, which is a sample for the past seven days, and (2) non real-time data, which is a separate sample of real-time data that can go from 2004 to 36 h prior to the search date.
A significant aspect to be considered is that the relative search volume (RSV) is normalized from 0 to 100 over the selected period, based on the maximum of the absolute activity volume during the same time span explored. Therefore, comparing search results related to different periods is not possible. In particular, selecting a period since 2004, the intervals in which the data are supplied are monthly, for the past five years and single years, weekly, for time intervals equal to or greater than 30 days, but less than 90 days, data provided by GT are on a daily base.
GT also includes the "Related Queries" box. This tool is useful for identifying the popular terms that accompany or follow the selected main search term. GT offers two options: "Most searched" and "Rising". In the first choice, the search results are visualized on a relative scale: 100 indicates the most searched query, 50 indicates a query with half the searches of the most searched query, and so on; in the second preference, we can categorize the queries whose search frequency has increased the most since the last period.

The Method
We analyzed the online trend of the earthquake insurance (EQI) topic for the state of California in the period from 1 January 2004 to 31 December 2021 (18 years). For the same period, we also considered the search trend analysis for the earthquake (EQ) topic.
The choice to use "Topics" as a search filter in GT was also suggested by the very diverse composition of the population of California, where no ethnic group constitutes an absolute majority of the over 39 million people, which are made up of 39% Latino, 35% white, 15% Asian American or Pacific Islander, 5% Black, 4% multi-ethnic, and others [43]. This implies different cultures and languages spoken, which can also be used to search for on the internet. Indeed, 44.2% of people speak languages other than English at home [44].
Once the search filter was identified, another point to define was the time resolution of the internet data. Indeed, bearing in mind that for wide-time windows, we can only obtain RSV based on a monthly scale, it was necessary to adopt a strategy to assess the higher resolution (daily) data, useful for performing more inclusive and detailed analyses. For this purpose, we implemented the methodology proposed by Thompson et al. (2021) [38].
In detail, the procedure adopted consisted of four main steps: (1) to perform the search for the topic EQI and EQ for the entire period of 18 years (2004-2021); (2) to identify, for the period, the month with the highest search interest (100) assigning it a weighting of 1 and then rescaling the interest to all the months of the past 18 years; (3) to acquire the daily data for both topics by each of the 216 months that make up the 18 years examined; and (4) to scale the daily data of each month by multiplying the daily search volumes by the corresponding weighted monthly value. In this way, we obtained the daily series for both topics over the entire period (2004-2021). We retrieved data on 8 February 2022. All GT data (both EQ and EQI) were imported in a Microsoft Excel 2013 spreadsheet to be elaborated following the procedure described above. Once obtained, the daily data on both EQ and EQI were analyzed to evaluate potential correlations with earthquakes.
In order to obtain data on earthquakes (date, time, epicentral area, magnitude, and community-determined intensity map (CDI)) that occurred both in California and bordering areas and the USA, as well as over the world, the USGS Earthquake Catalogue was utilized as a reference [45]. The earthquakes that were recognized as triggering online interest in EQI, as explained in Section 3.2, were plotted to build thematic maps using ArcGIS 10.3 software.

From the USA to California: A Progressive Zoomed in Analysis
The GeoMap related to the online searches in EQI for the USA capturing the entire period (2004-2021) shows a quite complex picture ( Figure 2). In order to take a closer look, we can perform a comparative analysis of GeoMap both with the seismic hazard map of the USA and with the map of earthquakes (M ≥ 5) occurred in the time span of 18 years analyzed in this article ( Figure 2).
In order to obtain data on earthquakes (date, time, epicentral area, magnitude, and community-determined intensity map (CDI)) that occurred both in California and bordering areas and the USA, as well as over the world, the USGS Earthquake Catalogue was utilized as a reference [45]. The earthquakes that were recognized as triggering online interest in EQI, as explained in Section 3.2, were plotted to build thematic maps using ArcGIS 10.3 software.

From the USA to California: A Progressive Zoomed in Analysis
The GeoMap related to the online searches in EQI for the USA capturing the entire period (2004-2021) shows a quite complex picture ( Figure 2). In order to take a closer look, we can perform a comparative analysis of GeoMap both with the seismic hazard map of the USA and with the map of earthquakes (M ≥ 5) occurred in the time span of 18 years analyzed in this article ( Figure 2). As we can see from Figure 2, the GT EQI GeoMap fits differently the seismic hazard of the territory of the Federal Republic ( Figure 2). Online search interest prevails in the states of the west coast (from south to north: California, 100, Oregon, 53, and Washington, 66), where the seismic hazard is very high or moderate-high. However, numerous and even major earthquakes occurred in California in the 18 years, events which did not occur in the states of Oregon and Washington. In these two states, the interest in EQI was probably nurtured by the seismic activity occurred in the bordering states, primarily California. Moving east, for Utah (70) we can observe an overlap between the seismic hazard map and the GeoMap. However, in Utah, a significant earthquake occurred during the 18 years. For the states of Idaho, Montana, and Wyoming, where there are large areas characterized by moderate and high seismic hazard, online attention fluctuates from zero to low. This difference can be explained by the circumstance that no earthquakes occurred, at least of moderate to high magnitude (Wyoming), or they took place in sparsely populated areas (Idaho and Montana). Looking at the central United States, there is a good agreement between the GeoMap and the hazard map. In fact, the states with low and low-moderate seismic hazard show no or low search activity (North Dakota, 0; South Dakota, 0; Nebraska, 8; Kansas, 22 and Texas,20). In these states, no important seismic activity was recorded. Looking at Oklahoma, the higher seismic hazard of the state well corresponds to a higher internet activity (41). However, some significant earthquakes occurred in the state. Conversely, the correlation between the GeoMap and the moderate-high seismic hazard of the centraleastern (Arkansas, Missouri, Illinois, and Tennessee), and eastern (South Carolina) states is low. Again, the differences can be explained, at least in part, by the low seismic activity in the period (2004-2021), as Figure 2 shows. Quite a similar consideration can be made for the northeastern states (Maine and New York).
These analyses seem to show that the EQI interest in users was not determined by the awareness of the seismic hazard of the territory by online users (predisposing factor), but rather it was driven by the earthquake occurrence(s) (determining factor). This anticipates what we will now discuss in detail for California.
Focusing on this state, we can consider both the geographic and temporal patterns.
Regarding the first point, the 18-year GeoMap related to earthquake insurance provides evidence that interest was especially concentrated in Southern California ( Figure 3). This seems consistent with the circumstance that the Southern San Andreas Fault strand is the most likely to host a large earthquake in the next three decades, as confirmed by the recent Third Uniform California Earthquake Rupture Forecast (UCERF3) [47]. However, analyses that are more detailed will be discussed later. Regarding the second point (temporal patterns), we can see that the earthquake insurance interest grew over time (Table 1 and Figure 4). Now, we consider this complex pattern in detail.
The period of 18 years can be subdivided into three main temporal windows, each of which is related to a different level of interest: the first period from 2004 to 2009, the second Regarding the second point (temporal patterns), we can see that the earthquake insurance interest grew over time (Table 1 and Figure 4). Now, we consider this complex pattern in detail.  A quick (visual) comparison between the EQI and EQ online data ( Figure 4) seems to show a certain similarity in trends, which is consistent with the circumstance that among the EQI-related queries (most searched option), the search for the term "earthquake" had the highest relative frequency (100). This would suggest that interest in online searches on insurance was at least to some extent stimulated by earthquake occurrence, even though the statistical correlation between the earthquakes (M ≥ 5.0) that occurred in California as well as bordering areas and the online interest is not so close (Table 2). Table 2. Spearman rank correlation and Pearson linear correlation between (1) GT earthquake insurance and GT earthquake daily data; (2) GT earthquake and the magnitude of earthquakes (M ≥ 5.0) occurred in California and bordering areas; (3) GT earthquake insurance and the magnitude of earthquakes occurred in California. In cases of more than one earthquake the same day, the  The period of 18 years can be subdivided into three main temporal windows, each of which is related to a different level of interest: the first period from 2004 to 2009, the second period from 2010 to 2015, and the third period from 2016 to 2021.
Although the interest was low in the first six years, the days in which online searches were carried out tripled in the second period and fivefold in the third period, when compared to the first time interval. The highest activity in the third period is also plain by the total RSV (7373), which in the third period alone represents approximately 55% of the total scores (13,490) over the 18 years analyzed (Table 1). Furthermore, the third period also saw the lowest percentage of days with no online activity (RSV = 0).
A quick (visual) comparison between the EQI and EQ online data ( Figure 4) seems to show a certain similarity in trends, which is consistent with the circumstance that among the EQI-related queries (most searched option), the search for the term "earthquake" had the highest relative frequency (100). This would suggest that interest in online searches on insurance was at least to some extent stimulated by earthquake occurrence, even though the statistical correlation between the earthquakes (M ≥ 5.0) that occurred in California as well as bordering areas and the online interest is not so close ( Table 2). Table 2. Spearman rank correlation and Pearson linear correlation between (1) GT earthquake insurance and GT earthquake daily data; (2) GT earthquake and the magnitude of earthquakes (M ≥ 5.0) occurred in California and bordering areas; (3) GT earthquake insurance and the magnitude of earthquakes occurred in California. In cases of more than one earthquake the same day, the highest magnitude was considered.

Period
Years However, the values of correlation coefficients need to be analyzed gauging that (1) earthquake data do not include events outside California and bordering areas, which contributed to online interest, as will be explained later; (2) the analysis does not consider the occurrence of other natural hazards (Hurricane Harley) besides the earthquakes, which had an effect on the insurance search, as discussed later; (3) the occurrence of an earthquake can have direct-immediate and/or indirect-delayed impacts, stimulating online searches in the medium term causing, as we will discuss later, online attention in insurance that can continue for several days or weeks, as in the case of the Mexico earthquake of 2017, even without the occurrence of earthquakes; (4) an earthquake can determine the effects on online searches that increase the day after the earthquake occurred, depending on the time that the earthquake occurred (at night); (5) an earthquake can trigger online effects only after the second or third shocks, after the seismic events generated repeated experience(s) on people; (6) the distribution of the population in California is rather clustered in large urban agglomerations located in the central and western part of the state ( Figure 5); therefore, the lack of enough "human sensors" in the eastern areas of the state able to record some of the earthquakes that occur in those areas can prevent sizeable online activity; and (7) the general interest on the subject of insurance is very low in the first period (only 8% of RSV days >0) and rather low in the second, and then clearly rises in the third period (41% of RSV days >0), for which the correlation between the occurrence of earthquakes and online searches on insurance clearly increases.

A closer Analysis: The Earthquakes and Other Natural Hazards as Factors Sparking off Interest
To analyze what ramifications the occurrence of earthquakes had on online behavior, we considered EQ and EQI online data comparatively. In particular, when we ascertained a rapid growth in daily online interest for both the EQ and EQI topics, we checked for potential contemporary earthquake(s) firstly in California and bordering areas, then in the USA, and lastly overseas. In case of earthquake occurrence(s), the natural hazard(s) was considered as triggering the insurance searches. Spikes of online activity on EQI were also related to possible other natural extreme events other than earthquakes, such as Hurricane Harvey in August 2017, as we will see later.
We classified the EQI online attention into four levels, from low to very high, according to Table 3. We also analyzed the duration of online interest. Table 4 and Figure 5 summarize all these analyses. eastern areas of the state able to record some of the earthquakes that occur in those areas can prevent sizeable online activity; and (7) the general interest on the subject of insurance is very low in the first period (only 8% of RSV days >0) and rather low in the second, and then clearly rises in the third period (41% of RSV days >0), for which the correlation between the occurrence of earthquakes and online searches on insurance clearly increases.   Table 3. Criteria followed to attribute the level of online attention for earthquake insurance.

RSV Range
Level of Attention    In order to make the analysis more complete, we also made some inferences on online EQI data and real data on insurance underwritings.
Overall, we identified 38 events, including earthquakes and other natural hazards. These acted as triggers of online activity on earthquake insurance searches. In detail, 4 were the earthquakes in the first period, 10 were in the second period and 24 were in the third period. From the inclusive analysis, the occurrence of earthquakes or other natural hazards caused a growth of interest that waved between one day and over two weeks, even if with discontinuous activity.
The first event on the list is the Yucaipa earthquake (M 4.9), which caused one day of moderate interest in EQI. Afterwards, the Calipatria earthquake (M 5.1) saw a rise of interest in EQI for two days with an overall level of attention that can be assessed as moderate even though, according to the USGS earthquake catalogue, no people were killed or missing, no people were injured, and no buildings were damaged or destroyed. This finding is in accordance with both the literature that considers the effects of "no-loss" experience that nonetheless raises the salience of seismic risk [48] and the results of other research in earthquake insurance [40].
The next earthquake for which there was a lightning of interest (1 day) was the event that occurred in Peru on 15 August 2007 (M 8.0). The main event occurred at 23:40:57 (UTC), thus resulting in a low increase in EQ interest (9) the next day, accompanied by a sudden increase in searches in EQI (18) with respect to the previous over three weeks, which indicated the total absence of online activity.
However, according to what we said above, interest in the first period was the lowest over the 18 years. This is in accordance with the low policy underwriting increase for almost the same period (2005-2009). In fact, the subscription data indicate that the total increase in policies sold was less than 50,000 from 2005 to 2009, with an average annual increase of only approximately 12,000 subscriptions, one of the lowest rates in 2005-2021 ( Figure 6). killed or missing, no people were injured, and no buildings were damaged or destroyed. This finding is in accordance with both the literature that considers the effects of "no-loss" experience that nonetheless raises the salience of seismic risk [48] and the results of other research in earthquake insurance [40]. The next earthquake for which there was a lightning of interest (1 day) was the event that occurred in Peru on 15 August 2007 (M 8.0). The main event occurred at 23:40:57 (UTC), thus resulting in a low increase in EQ interest (9) the next day, accompanied by a sudden increase in searches in EQI (18) with respect to the previous over three weeks, which indicated the total absence of online activity.
However, according to what we said above, interest in the first period was the lowest over the 18 years. This is in accordance with the low policy underwriting increase for almost the same period (2005-2009). In fact, the subscription data indicate that the total increase in policies sold was less than 50,000 from 2005 to 2009, with an average annual increase of only approximately 12,000 subscriptions, one of the lowest rates in 2005-2021 ( Figure 6). Regarding the second period, on 4 April 2010, a high-magnitude earthquake (M 7.2) occurred in Mexico, approximately 50 kilometers from the California border. The earthquake caused a rapid surge of interest in EQ (34) on the same day. However, there was no direct increase in interest in EQI. The following day (5 April) started a seismic sequence in an area located north of the epicenter of the earthquake with a magnitude of 7.2, near Ocotillo (California). The sequence continued until 7 April. These earthquakes, with a magnitude between 4.6 and 5.0, sustained the interest in EQ and contributed to stimulating online activity in EQI, with an RSV increasing for two days (RSV1 = 18; RSV2 = 19).
The disastrous Japan earthquake-tsunami of 11 March 2011 (Great Tohoku earthquake, M 9.1) caused online attention for approximately 10 days in EQ and 11 days in EQI (in the latter case, the values fluctuated between RSV = 39 on the day following the earthquake and RSV = 31 for 28 March). The high searches were most likely due to the large echo in the international media, thus stimulating the interest in earthquake insurance due to vicarious experience [29].
Real data on insurance underwritings appear to be in good agreement with the online data searches for the second period ( Figure 6). In fact, the maximum increase in subscriptions was recorded in 2011, the year in which the maximum volume of online searches (RSV = 1096) of the entire second period was documented (RSV mean = 642). GT data for 2014 also agree well with policy stipulations. In fact, 2014 saw the second most significant increase in policy stipulations in the second period (+32,000 contracts compared to 2013), similar to the total volume of online searches for EQI (RSV = 808).
The third period started with two low-magnitude earthquakes whose epicenters were in California, the 21 July 2015 Freemont and the 25 July 2015 Fonfata events, for which a low interest was recorded.
From 29 August to 5 September 2017 (8 days), interest in insurance seems to be not directly linked to earthquake(s). This can be inferred from both the low daily volumes in searches for earthquakes (≤2) and the absence of significant earthquakes in California as well as all over the world able of capturing attention of internet users for so long. The higher level of interest with respect to the previous months was likely driven by increased awareness of disaster insurance following the devastating Hurricane Harvey that made landfall first in Texas and then in Louisiana between 25 and 31 August 2017. Hurricane Harvey, classified as Category 4, caused more than USD 125 billion in damages [51,52], affecting 203,000 homes, of which 12,700 were destroyed [53].
As a result of the giant losses caused also by widespread flooding started on 26-27 August and the large echo received by Harvey from the media, a spike in searches on flood insurance occurred in California between the end of August and the beginning of September (week of 27 August-2 September) (Figure 7). Approximately in the same period (between 29 August 2017, a few days after the start of the landfall and flooding, and 5 September 2017), the searches for flood insurance likely had a "drag effect", triggering the increase in online activity on EQI, as discussed before. Therefore, it is interesting to note that the period saw an interest driven probably by vicarious experiences transferred between different natural hazards, which do not seem to be reported in the literature, as far as the authors are aware.
Continuing the analysis, also in September 2017, two very strong earthquakes occurred in Mexico: the earthquake of 8 September 2017 in Chiapas (M 8.2) and that of 19 September 2017 in Matzaco (M 7.1). These two earthquakes resulted in a sharp increase in EQ interest for two and six days, respectively. The two earthquakes also triggered a growth of online searches for EQI for at least eight days in the case of the Chiapas earthquake (from 8 September, RSV = 8 to 15 September, RSV = 25) and seven days for the Matzaco earthquake (from September 19, RSV = 26, to Day 25, RSV = 19, with the maximum value on Day 20, RSV = 72). For this last earthquake, the EQI interest could have been "sustained" by another seismic event (7 km NNE of Ixtepec, Mexico, M = 6.1), which occurred on September 23, stimulating internet activity until 25 September. The earthquakes had wide coverage in the international media, thus nourishing the interest in EQI. Furthermore, the search interest should also be seen in light of the ethnic composition of the California population. In fact, according to the 2014 estimates, 12.5 million people of Mexican origin comprise California's population [54]. Therefore, a more complex vicarious experience most likely influenced the triggering in online search. As defined above, vicarious experiences occur when a person is exposed to secondary sources of information about the disaster [55,56]. Thus, in the case of the September 2017 Mexican earthquakes, media, relatives, and friends acceptably guided the vicarious experience. However, according to what we discussed before, Hurricane Harley could also have contributed to fueling the interest in searching for EQI after the 2017 Mexican earthquakes. This would expand what other authors found, namely, that having frequent thoughts about earthquakes (natural hazards) induces more preparation and that distant but temporally close events can have positive effects on judgements about preparedness needs [56]. Moreover, since 26 September, when the direct effect of earthquake on EQI searches ended, a quite long period saw the interest to continue, even if in a discontinuous way ( data searches for the second period ( Figure 6). In fact, the maximum increase in subscriptions was recorded in 2011, the year in which the maximum volume of online searches (RSV = 1096) of the entire second period was documented (RSV mean = 642). GT data for 2014 also agree well with policy stipulations. In fact, 2014 saw the second most significant increase in policy stipulations in the second period (+32,000 contracts compared to 2013), similar to the total volume of online searches for EQI (RSV = 808).
The third period started with two low-magnitude earthquakes whose epicenters were in California, the 21 July 2015 Freemont and the 25 July 2015 Fonfata events, for which a low interest was recorded.
From 29 August to 5 September 2017 (8 days), interest in insurance seems to be not directly linked to earthquake(s). This can be inferred from both the low daily volumes in searches for earthquakes (≤2) and the absence of significant earthquakes in California as well as all over the world able of capturing attention of internet users for so long. The higher level of interest with respect to the previous months was likely driven by increased awareness of disaster insurance following the devastating Hurricane Harvey that made landfall first in Texas and then in Louisiana between 25 and 31 August 2017. Hurricane Harvey, classified as Category 4, caused more than USD 125 billion in damages [51,52], affecting 203,000 homes, of which 12,700 were destroyed [53].
As a result of the giant losses caused also by widespread flooding started on 26-27 August and the large echo received by Harvey from the media, a spike in searches on flood insurance occurred in California between the end of August and the beginning of September (week of 27 August-2 September) (Figure 7). Approximately in the same period (between 29 August 2017, a few days after the start of the landfall and flooding, and 5 September 2017), the searches for flood insurance likely had a "drag effect", triggering the increase in online activity on EQI, as discussed before. Therefore, it is interesting to note that the period saw an interest driven probably by vicarious experiences transferred between different natural hazards, which do not seem to be reported in the literature, as far as the authors are aware.  Interest continued in 2018 also without relevant earthquakes. In fact, the period saw 158 (44%) days with RSV > 0 and a total search volume equal to 63% of 2017, thus showing that people continued to search for earthquake insurance. This online behavior was also stimulated by some earthquakes of low-moderate magnitude that occurred in California, such as those that occurred on 5 April 2018 in Santa Cruz (M 5.3) and 4 July 2018 in Berkley (M 4.4).
Considering the online data of both 2017 and 2018, one might wonder whether searching for information and real preparedness actions involved a distinct process [57] or whether coverage underwritings followed online searches. Really, online search data are in good agreement with the number of policies signed both in 2017 and 2018 ( Figure 6). In fact, 2017 saw an increase in the sales of policies with respect to the previous year of over 100,000, the highest annual increase in the 2002-2017 period (based on both the CEA and non-CEA contracts). In addition, approximately 270,000 more policies were taken out in 2018 than in 2017, the highest yearly increase ever recorded in the 2002-2020 period. The fact that the largest GT searchers were recorded in 2017 and not in 2018, the year in which the largest number of contracts were signed, does not seem inconsistent. In fact, the contracts signed in 2018 were probably the landing place of online searches carried out both in the last quarter of 2017, after the earthquakes in Mexico in September, and in 2018. From this point of view, the signing of contracts seems to derive, at least in part, from a mature judgement over the months, during which users probably acquired more in-depth information on coverage, how also suggest the related queries of which we discussed below. Therefore, the earthquakes in Mexico appear to have acted as a strong stimulus for increasing policy underwritings; thus, real actions followed intentions. However, the "drag effect" of Hurricane Harvey could also have acted in boosting preparedness motivation.
The occurrence of the Ridgecrest (Southern California) sequence began on 4 July 2019 with the culmination of the earthquake on 6 July 2019 (M 7.1). According to the USGS [58], the last event caused injuries to 5 people; 50 homes were structurally damaged, and 4 homes were damaged by fire from broken gas lines. Water lines were broken, and power outages occurred in the Ridgecrest-Trona area; additional cracks and landslides occurred on California State Route 178. Damage was estimated to be in excess of USD 100 million. The sequence produced effects on EQ from 4 to 7 July, with the maximum (100) on 6 July, and effects on EQI from 4 July (RSV = 20) to 11 July (RSV = 21), with the highest peak (RSV = 100) of the 18 years investigated occurring on the same day as the seismic event with the highest magnitude (M 7.1, 6 July), where also the highest peak in EQ interest was recorded.
After the Ridgecrest sequence, the next earthquakes that caused immediate EQI interest were two events that occurred on 15 October 2019. However, in a period of approximately three months between the Ridgecrest sequence and the two earthquakes of October 2019, approximately 50% of the days saw search volumes above zero, even without the spur of earthquake occurrence(s).
The Overall, the correlations between the current insurance policy data and online earthquake insurance interest, already found in other national contexts, such as Italy [40], are in accordance with that line of research that analyses the effects of the communication of earthquake and earthquake risk, chiefly by media, showing the relationship between information dissemination and the people's decision to buy a disaster policy [60,61].
Supporting evidence of the deduction that the underwriting of policies likely followed the online searches is found in the GT-related queries of the third period (2016-2021) ( Figure 8). From these, we can see that the searches with a sudden or high increase are those related to obtaining specific information, such as insurance agents and cost of coverage. Regarding the latter, the trigger of interest for insurance caused by the earthquake occurrences seems to have sustained the buying of policies, making less relevant the main obstacle to holding a policy in California, i.e., the price of the coverage [17]. However, considering high user interest in the price, stakeholders should also consider the issue in relation to the possible comparison between different insurers by buyers, thus attempting to overcome those institutional features of the California insurance market, which produce market friction able to prevent comparison shopping [62]. Beyond cost information, users also searched for additional information on insurance, attempting to understand whether coverage can be a suitable solution to compensate for losses. In this direction are the search terms such as 'earthquake insurance California worth it' or searches with the same meaning such as 'should I get earthquake insurance' (Figure 8). These entire consumer perspectives should be considered by suppliers of insurance when information and communication campaigns on earthquake insurance are planned, thus showing the pros and cons of coverage. This need is in line with what was found by some studies on the life and nonlife insurance sectors, which revealed that access to useful and accurate information is a key element for informed decision making [63] and that consumer awareness and understanding of insurance products are among the important tools to help consumers make better decisions [64].
In summary, GT data suggest that earthquakes and other natural hazards stimulate online interest in earthquake insurance through different typologies of user experiences, namely direct and vicarious. Most likely, real actions followed online interest in some cases with the underwriting of insurance contracts. Figure 9 shows the possible operative flow of internet users.  Beyond cost information, users also searched for additional information on insurance, attempting to understand whether coverage can be a suitable solution to compensate for losses. In this direction are the search terms such as 'earthquake insurance California worth it' or searches with the same meaning such as 'should I get earthquake insurance' (Figure 8). These entire consumer perspectives should be considered by suppliers of insurance when information and communication campaigns on earthquake insurance are planned, thus showing the pros and cons of coverage. This need is in line with what was found by some studies on the life and nonlife insurance sectors, which revealed that access to useful and accurate information is a key element for informed decision making [63] and that consumer awareness and understanding of insurance products are among the important tools to help consumers make better decisions [64].
In summary, GT data suggest that earthquakes and other natural hazards stimulate online interest in earthquake insurance through different typologies of user experiences, namely direct and vicarious. Most likely, real actions followed online interest in some cases with the underwriting of insurance contracts. Figure 9 shows the possible operative flow of internet users.

Conclusions, Perspectives and Limitations
This article analyzed the online interest in California on the topic of earthquake insurance. The search made use of Google Trends, the online tool that allows anyone to visualize and discover trends in people's search behavior within Google Search. The analysis period covered all available whole years, from 2004 to 2021. For this period, we

Conclusions, Perspectives and Limitations
This article analyzed the online interest in California on the topic of earthquake insurance. The search made use of Google Trends, the online tool that allows anyone to visualize and discover trends in people's search behavior within Google Search. The analysis period covered all available whole years, from 2004 to 2021. For this period, we retrieved the monthly data and processed them to derive a daily trend over the 18 years.
The research showed that insurance interest varied greatly over time, increasing in particular during the third period (2016-2021). Numerous seismic events triggered and fed the online interest. These earthquakes originated both in California and contiguous areas as well as overseas, such as the 2011 Japan and 2017 Mexico events, thus connoting that interest of users was motived by both direct and vicarious earthquake experiences.
After the 2017 Mexico earthquakes, we observed a "drag effect" that lasted in time. Most likely, part of these online searches then merged into the sharp increase in contract underwriting recorded in real data for both 2017 and 2018. However, other natural hazards, such as Hurricane Harley, probably contributed to fueling the interest in insurance and the purchase of policies whose real data fit quite well with the GT search volumes over time.
The duration of the peak of interest varied from a minimum of one day to a maximum of one week, even though it can reach over three weeks for extreme natural events of particular magnitude and effects on the anthropic system, as occurred after the 2011 Japan earthquake-tsunami.
To benefit from the advantages and opportunities made available by (free) Big Data, it could be useful to arrange an observatory for monitoring online user behavior through GT to evaluate, especially in coincidence with or close to seismic events or other extreme natural events, how online behavior and interest in insurance change in California over time. Furthermore, the changes can also be monitored considering the metropolitan areas and cities, taking advantage of GeoMaps placed at the community's disposal by Google Trends. In this way, stakeholders, policymakers, and insurance companies could set up more effective and targeted communication campaigns in the most appropriate geographic areas in a timely manner to grab the rise of interest on the subject. From this point of view, GT also suggests that users are interested in the costs of the policies, where to buy them, and the pros and cons of coverage.
However, this research has some limitations to which the readers have to be attentive. For example, the way in which GT generates data is not clear since GT does not supply comprehensive technical documentation. Furthermore, GT excludes repeated queries from the same users over a short time period to reduce the number of continuous searches. In addition, considering that GT supplies only relative volumes of searches, we cannot identify how many people truly searched for earthquake insurance; thus, we can only suppose the correlation between GT volumes and real data (insurance underwritings). Further research in the socioeconomic field through, for example, the administration of ad hoc questionnaires prepared for policyholders could solve these limitations. Therefore, while the research presented here provides positive and encouraging results, it is necessary to underline the need to perform interdisciplinary research efforts to reach the aim of taking full advantage of Big Data to support practical actions.