Big Data and Climate Change

: Climate science as a data-intensive subject has overwhelmingly affected by the era of big data and relevant technological revolutions. The big successes of big data analytics in diverse areas over the past decade have also prompted the expectation of big data and its efﬁcacy on the big problem—climate change. As an emerging topic, climate change has been at the forefront of the big climate data analytics implementations and exhaustive research have been carried out covering a variety of topics. This paper aims to present an outlook of big data in climate change studies over the recent years by investigating and summarising the current status of big data applications in climate change related studies. It is also expected to serve as a one-stop reference directory for researchers and stakeholders with an overview of this trending subject at a glance, which can be useful in guiding future research and improvements in the exploitation of big climate data.


Introduction
Big data analytics have been rapidly developing along with the emerging needs of big data technologies in numerous subjects (see, for example, [1,2]).The accessibility, availability and exponentially growing quantity of big data have further promoted the corresponding technological advancements and practical implementations.Earth is a complex dynamical system [3]; thereafter, big data analytics encountered more challenges in climate science than other subjects regardless of the extensive resources of big climate data.Climate change as an emerging topic and also a data-intensive subject has been the research focus of big data scholars over the past several decades [4,5].Exhaustive big data analytics applications have been carried out on big climate data, while the Internet of Things, cloud computing, big data tools to investigate climate, as well as intelligent analytics platforms and new technological progressions, have further emphasized its significance and possible impacts on climate science and big data science development (see, for example, [6,7]).
Given the context of combating climate change, existing research has applied big data analytics in mainly the aspects of energy efficiency, intelligent agriculture, smart urban planning, weather forecast, natural disaster management, etc.Although overall this is not a new subject and there is a large amount of existing literature, there is no recent review to the best of our knowledge that particularly investigates the topic of big data in climate change, not to mention that the novel developments are progressing rapidly everyday along with the technological advancements.Therefore, this paper contributes to the existing literature by providing the most up-to-date overview of big data applications in climate change related studies at a glance with the most recently published research that reflects the cutting edge of this topic.It is of note that over 80% of the listed applications are after 2016, which makes this review the latest comprehensive review of big data in climate change that is significantly different from the previously existing literature.This paper also contributes by serving as the one stop directory for researchers to gain the most up-to-date overview of this topic.Furthermore, we aim to summarize the popular practice court of research in this domain, and also seek to identify the non-mainstream applications that lack thorough exploration.It is expected that, by providing this comprehensive review, both researchers and practitioners can gain better knowledge of the current research trend and identify the research gaps with valuable potential.
As can be seen in Figure 1, it is identified that the applications of big data in climate change have two fundamental elements: the big climate data resources and the big data analytics techniques.We classify these studies by means of value creation as well as the specific topic of application.For convenient access of applications and clear guiding purposes, it is summarized that big data in climate change mainly function in four aspects of value creation: observing and monitoring, understanding, and predicting and optimizing, whilst the applications are grouped into five topics: energy efficiency and intelligence; smart farming and agriculture and forestry; sustainable urban planning and infrastructure; natural disaster and disease assessment; and other advanced supports.The remainder of this review paper is organized such that the values of big data to climate change study are summarized in Section 2 along with the trends of this focused topic.Section 3 lists a detailed review of the big data applications in climate change studies by topics.Finally, the paper concludes in Section 4 with current challenges and directions of future research.

Observing and Monitoring
One of the insights big data can bring is thoroughly revealing the realities from the large volume of data recorded.The exceptionally large sources of data contain significantly useful information and is also the fundamental asset of big data analyses.Monitoring the climate system is critical for better understanding the interactions within the system and its drives, respectively.Moreover, it is also beneficial for us to know the changes that may occur due to the global warming [3].Thereafter, observing and monitoring can be considered as the fundamental value that big data brings along when it is incorporated with climate change study.
In order to obtain thorough observation and comprehensive parameters of climate change, earth observation technology has played a significant part over the past decades [8].A multi-dimensional big data system has been established and is still promptly developing, which has enabled us to observe and monitor changes on a global scale of diverse earth and climate parameters.According to [3], the climate data have generally four different sources: in situ, remote sensed, model output and paleoclimatic.A more detailed review of the climate data sources and corresponding features can be found in [9].Later, Guo et al. [8] has thoroughly reviewed the Earth observation big data sources and relevant programs; for instance, the satellites that are working in climate change research and their functions, the remotely sensed oceanographic parameters and representative sensors, the essential climate variables by the United Nations Framework Convention on Climate Change, the atmospheric parameters by different international agencies, etc.Specifically, Sun et al. [10] provided a review of the global precipitation data sets regarding its sources and comparisons.

Understanding, Predicting and Optimizing
In the context of the nature of climate science, which investigates the tremendous global scale changes of various observations/parameters, its 3Hs feature (high dimension, high complexity and high uncertainty) has made it a great playground for big data researchers to explore and analyse even before proceeding to data mining.Besides the building up the multidimensional system of collecting and monitoring climate change, big data has also promoted the rapid progression of data-intensive analytics in climate change related studies.Here, we briefly categorizes them into the aspects of understanding, predicting and optimizing.
Understanding the big data (or data empathy) according to [3]) is a challenging task considering its 5Vs feature (volume, variety, velocity, veracity and value), revealing the hidden valuable information from big data requires adequate knowledge of the purposes of corresponding data as well as the techniques/methods for collecting the data.As a trending and emerging topic, big data researchers who are also interested in climate sciences have been exposed to abundant established resources, for instance, the Global Climate Observing System (GCOS), Earth System Grid Federation (esgf.llnl.gov), the National Center for Atmospheric Research (ncar.ucar.edu),United Nations Global Pulse (unglobalpulse.org),the Climate Data Guide (climatedataguide.ucar.edu),NASA Global Climate Change (climate.nasa.gov), the NASA Center for Climate Simulation (nccs.nasa.gov)and many other international and national climate monitoring and analysing institutions over the world.A detailed report that introduces the core of global scale climate research and cyber-infrastructure can be found in [11].
The abundant resources above have enabled us to gain knowledge of what the big climate data are how the data are collected, and what the data can be used for.However, these barely scratch the surface of big data analytics.Big data have also been playing a vital role in predicting when it is incorporated with climate science, for instance weather forecasting, natural disasters monitoring and early warning, energy consumption forecasting, traffic forecasting, etc.By applying corresponding data mining techniques (the detailed introduction of data mining techniques can be found in [12,13]), it allows knowledge discovery of the potential relationships and causal inferences, which further contribute to the modelling and predicting [14].
Accurate forecasts can aid in adaptive policy making in relation to climate changes, whilst the value creation feature of big data puts emphasis on optimizing.Being able to understand and predict based on sufficient knowledge extracted from big data or drawing inferences across different cases/applications are relatively straightforward.However, optimization requires comprehensive theoretical understanding as well as adequate big data analytics skills to structure the optimal model/infrustructure so as to maximise performance, efficiency and utility, or, in some cases, for achieving sustainable development.In recent years, exhaustive relevant applications have been carried out on energy efficiency management, natural resource management, smart grid, smart farming, etc.More details of applications are provided in Section 3.

Trends
Google Trends provide the real-time index of interest for keywords that people search worldwide, and it has become a great tool for both academic research and practical implementations in marketing, journalism, entrepreneurship, etc.As can be seen in Figure 2, the worldwide Google Trends of big data have shown the interest index over time, which is generally between 80 to 100 with 100 the peak popularity.It reflects the emerging and trending topic of big data and its relevant terms over the recent years.Although climate change has the average interest index around 40 over time, the rapidly ascending interest from 20 with a wide span over 60 within three years also confirms the increasing popularity and growing attention to climate.Yaqoob et al. (2016) [15] presented the emerging trend and sources of big data, i.e., Internet of Things [16], multimedia [17] and social media [18,19].The authors also summarized the state-of-the-art big data processing technologies by categorizing them into the batch based and the stream based processing technologies (more detailed introduction of each technology and a list of big data vendors can be found in [20], a brief summary of the open source tools and initiatives can be seen in [21]).Another tool that is developed by the United States Environmental Protection Agency in 2015 named EnviroAtlas [22] is a web-based ecosystem service framework that brings together environmental, economic and demographic data.
In regard to big data analytics in climate science, some recent developments focus on the introduction of different platforms/services that adopt advanced technological progressions like cloud computing [23], in memory processing, real-time processing, etc. Schnase et al. [24] introduced Modern-Era Retrospective Analysis for Research and Applications (MERRA) analytic services that will enable cloud computing and analytics-as-a-service for climate science, it enables the capability of providing complete life cycle management of large scale scientific repositories, as well as incorporating MapReduce analytics for research and applications targeting a broad range of users.Another platform proposed by Hu et al. [25] named ClimateSpark is an in-memory distributed computing framework that aims to facilitate big climate data analytics and improve computational efficiency.Manogaran et al. [26] proposed the spatial cumulative sum algorithm to be planted in a scalable big data processing framework for seasonal climate change detection, and the new algorithm has been validated, outperforming the existing change detection approaches.Later, in [27], the authors introduced the In-Mapper based combiner algorithm with higher performance than the widely used MapReduce algorithm in terms of processing big climate data.The proposed algorithm was implemented in seeking correlation between the climate parameters and incidence of dengue by integrating climate and health data.

Big Data Research of Climate Change by Topics
Besides a few systematical reviews of big data and its techniques, applications and challenges in a broad sense (see [20,[28][29][30]), there have been a number of subject-specific review papers that summarise big data applications in its selected subject (see, for example, crime analysis [13], causality analysis [31], petroleum industry [32], banking [1], blockchain and cryptocurrency, [2], etc.).This section further categorizes big data research in climate change mainly into five topics: energy efficiency and intelligence; smart farming, agriculture and forestry; sustainable urban planning and infrastructure; natural disaster and disease assessment; and other advanced supports.For each topic, a selective review of applications and research of recent years are presented in chronological order.

Energy Efficiency and Intelligence
One of the primary roles of big data analytics in climate change is managing and utilizing the resources to fight global warming.There have been exhaustive applications focusing on improving energy efficiency and building up intelligent energy networks.
Zhou et al. [33] provided a review of smart energy management incorporating big data analytics, in which the authors presented the industrial infrastructure and resources for smart energy management.In the context of opportunities coming along with big data analytics in electric utilities, the relevant issues and opportunities are discussed and reviewed in [34] focusing on the policymaking perspective.The review of [35] summarised the data driven approaches for predicting building energy consumption at urban level, whilst the data driven technologies and applications are reviewed in regard to the development of intelligent energy network in [36].
The recent reviews of big data applications in smart grida/smart metera can be found in [37][38][39], in which the authors aim to provide a road map guide by comprehensively summarizing the most up to date applications in this specific subject, introducing the latest big data analytics technologies adopted, and listing the mainstream platforms.The papers that investigate big data applications and technologies of building energy management can be found in [40][41][42].Similarly, the review of [43] focused on the prediction and classifications of building energy consumption on the future micro-scale change for a particular building; the review of [44] addressed the uncertainty analysis and big climate data in assessing building energy performance; from the perspective of data mining techniques and applications, a review of research on building operational performance improving is summarized in [45]; the review of [46] emphasises the cyber-physical systems of smart buildings and the diverse data driven approaches for energy efficiency control, particularly focusing on strategies for existing buildings.
Given that the available recent reviews have comprehensively covered the studies before 2017 and some of the research in 2017, here we focus on identifying the latest significant research and intercapillary research that have not been previously reviewed so as to present the most complete and up-to-date overview of this particular subject.
In terms of latest development on big data analytics platform/system, Linder et al. [47] introduce the Big Building Data (BBData) platform for smart buildings using big data technology.It enables observation of the state of room/building and automated implementation of certain rules to enhance comfort while economizing energy and achieving sustainable development.Another smart building energy management platform named the Internet of Things Energy Platform (IoTEP) is proposed in [48]; it is based on FIWARE and oriented to support and ease the analysis of heterogeneous big energy data with the assistance of complex event processing and clustering.
Zakovorotnyi and Seerig [49] presented a practical and semi-automated artificial neural networks clustering method to group the daily energy consumption profiles so to identify typical and untypical behaviour patterns and further improve building energy efficiency.Similarly, to enhance the building energy efficiency and operational performance, Fan and Xiao [50] applied gradual pattern mining to extract gradual rules from the real-world building energy big data.
Ashouri et al. [51] introduced a new methodology of building energy saving advisory systems by adopting data ming framework.Unlike the existing frameworks, this system considers the role of occupants.By extracting the consumption patterns and identifying the correlation between occupant activities and building energy consumption, the system can provide feasible energy saving recommendations for occupants.
To overcome the user confidentiality challenge when outsourcing big metering data to heterogeneous distributed systems, Jiang et al. [52] proposed the privacy-preserving query (P2Q) scheme over encrypted multidimensional big metering data so as to achieve user privacy in a semi-trusted cloud with fast response and high search quality.
A decision support system for energy efficiency management is developed in [53] that enables the capability of effectively processing cross domain data and high-level energy/climate data management and analytics.The novel high level architecture addressed by the authors integrated the single-entry, processing and semantification point for multiple-sourced data, data storage cluster, data access control, advanced analytics services and dashboard interface service.
A specific focus of the energy efficiency assessment of China industry sectors can be found in [54], where the authors applied k-means clustering and the multi-dimensional association rules algorithm to evaluate the environmental performance and seek possible energy leakages and faults so to further guide regional energy conservation.
In terms of the water energy efficiency management, a systematic review can be found in [55] that summarized the big data analytics applications in rainfall data management, flood risk management, water productivity optimizing, drinking water network management, etc.A recent paper [56] thoroughly investigated the big data challenges and coping strategies in water-energy metering intelligence.Moreover, it demonstrated four different case studies in details as showcase applications of big water-energy data analytics.

Smart Farming, Agriculture and Forestry
Another subject that big climate data has significantly benefited from over the past decade is agriculture.Agriculture is one of the most vulnerable domains when the negative impacts of climate change emerge from the past several decades.Its production heavily depends on the natural resources and agriculture itself is also one of the main sources of Green House Gas (GHG) emissions.The developments of big climate data and its analytics have prompted the widespread use of smart information management system, precision agriculture, as well as intelligent automatic agriculture.These implementations all contribute to the climate-smart aims of optimizing production with minimum costs and GHG emissions.Big data applications in agro-environmental science are summarized in [57] with analyses of theoretical framework and instructions of three different case studies so as to further address the limitations and challenges.A review of big data applications in smart farming can be found in [58].Later, Kamilaris et al. [59] presented the review of big data analytics practices in agriculture, followed by the authors' recent review of deep learning applications in agriculture [60].Pham and Stack [61] discussed the rapid development of precision agriculture and its impacts on both micro-level form capabilities and macro-level industry competitiveness, while an Internet of Things platform is designed for precision agriculture in [62].These existing review papers aimed to present the state-of-the-art big data analytics technologies, frameworks and challenges in agriculture and farming related practices in or before 2017.Thereafter, here we focus on only the latest applications and developments in this particular subject for filling the gap and providing the most up-to-date overview.
Some recent development focus on the forest ecosystem.Franklin et al.
[63] adopted four kinds of big data for the prediction of global climate change impacts on the plant populations and communities.Although the availability of data contributes to the ecosystem and climate change studies, the limitations of spatial representation, plot data, heterogeneity issues, and lacking of data standards still need to be improved by future research.It is also discovered that an emerging research trend of big data analytics in forestry resource focus on China [64].The province level forestry resource efficiency in China is investigated in [65] by data envelopment analysis model and Malmquist total factor productivity index.The information extracted from the big data of numerous economics, social and ecological factors due to the widespread forest landscape of China has offered valuable recommendations for sustainable development.
Li et al. [66] presented a case study of big data analytics application in forest ecosystem service at Anhui province, China.Another case study in China was proposed in [67], which focuses on the ecosystem service in a karst region-Guangxi provice of China-and identifies the positive influence of ecological big data analytics in ecosystem services.Xie et al. [68] focused on the influencing factors of ecological land change in China's Beijing-Tianjin-Hebei region.With the big climate/ecosystem data, the authors discovered substantial differences across factors regarding its impacts on ecological land changes in order to recommend on better strategies of natural resource management.A related research that also targeted China in [69], which introduced the big data analytics with provincial multisourced data on determining the optimal farmland conversion to reach the best balance between farmland protection and economic growth.

Sustainable Urban Planning and Infrastructure
Apart from the ecosystem and forestry resource management, the increasing greenhouse gas emission has brought attention to the urban planning and infrastructure in a sustainable manner.For instance, Hughes [70] analyzed the urban adaptation planning in the US in response to climate change.In terms of the role of big data analytics in achieving sustainable urban planning and infrastructure that also links to climate change, recent research mainly addresses the urban informatics, smart/green city and smart transportation (note that smart grid related research is summarized within the energy efficiency and intelligence topic).
In [71], Hashem et al. have systematically demonstrated the role of big data analytics in forming and supporting smart cities along with the review of important relevant literature.The authors have proposed the model of big smart city data management and evaluated the technological architecture and challenges.A few generic reviews or the discussion of big data in smart cities can be found in [72][73][74][75], while the role of sustainability in smart cities is exploited in [76,77].
A recent comprehensive review by Bibri and Krogstie [78] has summarized existing significant interdisciplinary literature regarding smart sustainable cities.Later, Bibri [79] identified the big data analytics applications enabled by Internet of Things for achieving environmental stainability, where the author also summarized the state-of-the-art data processing platforms and associated big data analytics technologies.Another similar piece of research that introduced the cloud-based green Internet of Things architecture can be found in [80].The most recent discussion regarding big data and smart cities by Lim et al. [81] seeks to identify the classification and challenges of big data use cases in smart cities.
In terms of technological developments, Zhu et al. [82] demonstrated the latest progression regarding big data and sensor-cloud, and then introduced three different types of sensor-clouds for the big data analytics architecture of green cities.To achieve reduced dimensionality while maintaining a high level of long-term average transmission reward, Huang et al. [83] introduced the threshold-based scheduling policy for the energy harvesting sensor networks used in green cities.
Although the intelligent transportation system associated with big data analytics has been well exploited for the past several decades, the sustainable aspect of application is still in its early development stage.In a recent paper, the bike sharing scheme in Shanghai is evaluated in [84] by big data analytics and the environmental benefits from a spatiotemporal perspective are identified.

Natural Disaster and Disease Assessment
Over the past decade, there have been debates regarding the relationships between climate change and natural disasters and disease [85][86][87].The climate change certainly increased the vulnerability and levels of risk with weather extremes and disease spread, and the natural diaster/disease assessment and management play an important role in climate science.Given the big data and climate change context, it is also vital to gain the most up-to-date knowledge of big data analytics applications in natural disaster and/or disease management.
There are existing reviews on data mining applications in terms of combating natural disasters [88,89]-for instance, floods, storms, landslides, volcanic eruptions, tsunamis, earthquakes, etc., in which the authors summarized the relevant big data analytics applications in predicting, detecting and management strategy improvement for a variety of natural disasters, along with identifying existing challenges and future research directions.A review that specially targeted climate prediction applications in China can be found in [90].
With regard to computational development, Miyoshi [91] reported the encouraging performance of a Japanese flagship K supercomputer on big climate data processing comparing to the system operated at the world's weather prediction centers.The performance reached 120 times more rapid speed and the big data assimilation system is also validated with encouraging performance.
One of the highlights is disaster resilience, and Ofli et al. [92] addressed the importance of aerial imagery data by unmanned aerial vehicles for timely disaster response, and a hybrid crowdsourcing and real-time machine learning framework was proposed to aid in the existing artificial intelligence disaster response platform.In a recent piece of research, Papadopoulos et al. [93] employed big data analytics in multisourced online press and social media big data, in order to validate the disaster resilience in supply chain for sustainability.Some recent research exploited the big data analytics in disaster assessment.Nguyen et al. [94] applied the deep convolutional neural networks (deep-CNN) on processing post disaster social media images, and the damage assessment showed effective and accurate performance, which also outperformed the existing technique bag-of-visual-words (BoVW).A case study in South Korea [95] predicted the regional level building damage risk under weather extremes or natural disasters using decision tree analysis.In regards to flood damage assessment mapping, the case study of the Myanmar flood in [96] emphasised the significance of global open geo-information, while later Cian et al. [97] adopted the earth observation big data and proposed the normalized difference flood index for flood mapping.Given the significant amount of emissions caused by fire disturbance on a global scale, a recent research work by Ramo et al. [98] investigated the global burned area mapping using classification data mining techniques.
It has been systematically reviewed in [99] of the big data analytics in global infectious disease surveillance, whilst its corresponding big data applications related to climate science still stay in their early stage of development.Traore et al. [100] applied data mining techniques on satellite imagery data for identifying risk areas that are exposed to epidemic crisis.Similarly, Manogaran and Lopez [101] focused on dengue and proposed a big climate data based surveillance system for continuous monitoring and timely warning.

Other Advanced Supports
Apart from the above domain topics of big data analytics in climate change studies, here we also seek to identify significant relevant research that emphasize other non-mainstream subjects, yet bring valuable insights on the role of big data in climate science.
Zhang and Chen [102] studied the prefecture-level big data of the Poyang Lake Eco-Economic Zone in China and offered local experience oriented recommendations to the national strategy given the aim of sustainable development.
Due to the increasing desire of enhancing sustainable competitive advantages, product lifecycle management is also playing a significant role in combating climate change.A big data driven product lifecycle management framework was introduced in [103] to assist with the decision-making of every stage of the product lifecycle covering designing, manufacturing, maintenance, disassembling, recycling, disposing, etc.The authors demonstrated the framework in detail and also validated the framework with real case studies showing significant improvement of material efficiency, energy efficiency, emission reduction, customer service improvement and economic benefits growth.Later, Liang et al. [104]adopted the Fruit Fly Optimisation and proposed the novel cyber physical system for machining optimisation during the manufacturing lifecycle.Another framework named the Intelligent Immune System is introduced in [105].It was also designed for improving productivity and energy saving, employed artificial neural networks for identifying abnormal patterns and planted the re-scheduling algorithm for follow-up processing.
A special focus on the electrical and electronic equipment waste/recycling management with big data analytics is addressed in [106].The authors proposed and illustrated the waste recovery/recycling framework based on the Internet of Things and big data technologies, and also discussed its economical and technological challenges of implementation.

Key Techniques for Big Data in Climate Change
The above sections have presented a selective review of applications and research in recent years, which are categorized into five main topics.In this section, from a technical perspective, we summarize the mainstream techniques for big data in climate change studies regardless of the topics of applications.It is of note that these techniques are not exclusive just for big data in climate change studies, but well established and widespread for applications in big data analyses on a broader range of subjects.
According to [107], big data is not a business solution but the underlying technology, where models are discovered through the algorithmic search process of data exploration [108].There are a variety of big data tools and the most popular ones are summarized by [21], which are Apache Hadoop and related projects, S4, Storm, Mahout, R, Massive On-line Analysis (MOA), Waikato Environment for Knowledge Analysis (WEKA), and Vowpal Wabbit.Each tool/platform has its own merits and their advancements are rather rapidly fast along with the evolutions of corresponding infrastructures, but the underlying techniques for big data analysis are mainly grouped into four categories: clustering, classification, regression and association rule mining [20].Moreover, it is also summarized by Wu et al. that the ten most influential data mining algorithms are C4.5, k-mean, SVM (support vector machine), Apriori, EM (expectation maximization), PageRank, AdaBoost, kNN (k-nearest neighbors), Navie Bayes and Classification and Regression Trees (CART), for which they have been incorporated in almost all commercial and open source big data analysis systems/platforms [12].To serve as a one stop directory guide for readers who are interested, we also briefly summarize these widely applied techniques as follows (the overall structure can be found in Figure 3).It is also necessary to address that the advancements of those techniques and new algorithms are also developing on an almost daily basis along with today's rapid technological progress.
Clustering has the key objective of grouping similar or closely related data objects together through data exploration so that different objects are separated [31,109].Its underlying fundamental concept is distance measure and it includes a variety of approaches: hierarchical clustering, k-means (partitioning clustering), high dimensional methods, density based clustering, co-occurrence based clustering and other evolutionary methods.
Classification, on the other hand, categorizes data objects into the predefined groups and it is one of the most fundamental big data analysis techniques with a collection of well established methods [110].One of the most well known methods is decision tree, which applies a series of crafted questions so to achieve the classification tasks for attributes [111].Support Vector Machines (SVM) [112] divided the objects into two classes by using an optimal separating hyperplane that was designed to assure the classification error is minimized.Another classifier named the Naive Bayes Rule applies Bayes Rule to calculate the probability of each class [113].This method is based on the assumption that all attributes are independent, and the prediction class can be determined for the one with the highest posterior probability.Neural Networks are inspired by the structure and mechanism of real neurons in gathering and processing information, and it enables the estimation of the posterior probabilities for completing the classification task [114].The kNN approach seeks to identify the nearest neighbours of the observation object so to determine its class label [115].Association Rule Mining was firstly proposed in [116] on the supermarket data set and aimed to investigate into co-occurrences among data objects.According to [117], it is a technique for identifying the simultaneous occurrence that happens more frequently than the average co-occurrence frequency existing in the data set.
Regression is also considered an important big data analysis technique due to its capabilities in dimension reduction, information extraction, estimation and prediction.The fundamental concept of regression is to investigate the relationship between two or more variables so to assist forecasting and decision-making.There are a variety of well established regression techniques like linear and nonlinear regression, lasso regression, logistic regression and regression tree, etc. Social Network Analysis (SNA) is based on the principle of graph theory and it is a relatively new big data analysis technique.It investigates the connections and contents among objects in a massive stack of information in order to structure a social network.The basic social network is formed by nodes and its related nodes connected by links (namely edges) [118,119].The most frequently used measurement techniques to analyze the patterns consisted of the social network including degree, density and centrality.

Conclusions and Future Research
Researchers have generally realized the big impact of data intensive research.Guo et al. [120] have summarized the milestones of Big Earth Data development and the big challenges that follow.In this paper, we present the most up-to-date overview of big data analytics in climate change covering over 100 research papers.It is observed that energy efficiency remains the most popular "practice court" followed by smart farming/agriculture and natural disaster assessment.Although there is a lot of existing literature on big data and smart cities, the sustainable aspect that links to climate change still stays in the early stage of its development.The topic of disease management also lacks thorough exploration.Moreover, there are some non-mainstream applications discovered in Section 3.5 that covers regional sustainable development strategy recommendation, product lifecycle management, and electronic waste management.
It is noted that the focus of big data analytics application in climate change seems to be unbalanced, and topics like waste/recycle management with valuable potential are neglected.The reasons may be a lack of corresponding data resources, a lack of economical profit and research funding [121], poor communication between groups of researchers with different skills, uncertainty information [122], etc.This review also identifies a few mainstream limitations of the current research: the uniformity and uncertainty of big climate data, the capability and efficiency of learning/analyzing big data compared to the speed of continuous data explosion, the dilemma of time, spatial and scale complexity, etc. (more details can be found in [3,123,124]).
One of the research trends identified among the recent applications is cloud computing, which provides a better solution for big data storage, transmitting and computational requirements [125].It is also worth noting the popularity of integrating the Internet of Things [126], the architecture of a platform that enables efficient, real-time, in memory cloud computing and storage of complex big data with the most advanced analytics or data mining techniques, as well as integrating the Internet of Things is the key to future research.

Figure 1 .
Figure 1.Framework of big data in climate change studies.

Figure 2 .
Figure 2. Google Trends index of big data and climate change since 2015.

Figure 3 .
Figure 3.The structure of big data key techniques.