<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xml:lang="en" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Sensors</journal-id>
<journal-title>Sensors</journal-title>
<issn pub-type="epub">1424-8220</issn>
<publisher>
<publisher-name>Molecular Diversity Preservation International (MDPI)</publisher-name></publisher></journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3390/s110201682</article-id>
<article-id pub-id-type="publisher-id">sensors-11-01682</article-id>
<article-categories>
<subj-group>
<subject>Article</subject></subj-group></article-categories>
<title-group>
<article-title>IJA: An Efficient Algorithm for Query Processing in Sensor Networks</article-title></title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Lee</surname><given-names>Hyun Chang</given-names></name><xref ref-type="aff" rid="af1-sensors-11-01682"><sup>1</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Lee</surname><given-names>Young Jae</given-names></name><xref ref-type="aff" rid="af2-sensors-11-01682"><sup>2</sup></xref><xref ref-type="corresp" rid="c1-sensors-11-01682">*</xref></contrib>
<contrib contrib-type="author">
<name><surname>Lim</surname><given-names>Ji Hyang</given-names></name><xref ref-type="aff" rid="af3-sensors-11-01682"><sup>3</sup></xref></contrib>
<contrib contrib-type="author">
<name><surname>Kim</surname><given-names>Dong Hwa</given-names></name><xref ref-type="aff" rid="af4-sensors-11-01682"><sup>4</sup></xref></contrib></contrib-group>
<aff id="af1-sensors-11-01682">
<label>1</label> Division of Information and e-Commerce, Wonkwang University, Iksan, Korea; E-Mail: <email>hclglory@wku.ac.kr</email></aff>
<aff id="af2-sensors-11-01682">
<label>2</label> Department of Multimedia, Jeonju University, Jeonju, Korea</aff>
<aff id="af3-sensors-11-01682">
<label>3</label> Department of Art Therapy, Daegu Cyber University, Daegu, Korea; E-Mail: <email>possible@dcu.ac.kr</email></aff>
<aff id="af4-sensors-11-01682">
<label>4</label> Control Instrumentation Engineering Major, Hanbat National University, Daejeon, Korea; E-Mail: <email>kimdh@hanbat.ac.kr</email></aff>
<author-notes>
<corresp id="c1-sensors-11-01682">
<label>*</label>Author to whom correspondence should be addressed; E-Mail: <email>leeyj@jj.ac.kr</email>; Tel.: +82-63-220-2936.</corresp></author-notes>
<pub-date pub-type="collection">
<year>2011</year></pub-date>
<pub-date pub-type="epub">
<day>28</day>
<month>1</month>
<year>2011</year></pub-date>
<volume>11</volume>
<issue>2</issue>
<fpage>1682</fpage>
<lpage>1692</lpage>
<history>
<date date-type="received">
<day>29</day>
<month>11</month>
<year>2010</year></date>
<date date-type="rev-recd">
<day>10</day>
<month>1</month>
<year>2011</year></date>
<date date-type="accepted">
<day>12</day>
<month>1</month>
<year>2011</year></date></history>
<permissions>
<copyright-statement>© 2011 by the authors; licensee MDPI, Basel, Switzerland.</copyright-statement>
<copyright-year>2011</copyright-year>
<license>
<p>This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).</p></license></permissions>
<abstract>
<p>One of main features in sensor networks is the function that processes real time state information after gathering needed data from many domains. The component technologies consisting of each node called a sensor node that are including physical sensors, processors, actuators and power have advanced significantly over the last decade. Thanks to the advanced technology, over time sensor networks have been adopted in an all-round industry sensing physical phenomenon. However, sensor nodes in sensor networks are considerably constrained because with their energy and memory resources they have a very limited ability to process any information compared to conventional computer systems. Thus query processing over the nodes should be constrained because of their limitations. Due to the problems, the join operations in sensor networks are typically processed in a distributed manner over a set of nodes and have been studied. By way of example while simple queries, such as select and aggregate queries, in sensor networks have been addressed in the literature, the processing of join queries in sensor networks remains to be investigated. Therefore, in this paper, we propose and describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or to minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor networks environments. At the same time, the simulation result shows that the proposed IJA algorithm significantly reduces the number of bytes to be moved to join nodes compared to the popular synopsis join algorithm.</p></abstract>
<kwd-group>
<kwd>sensor network communication cost</kwd>
<kwd>incremental algorithm</kwd>
<kwd>in-network query processing</kwd>
<kwd>wireless sensor networks</kwd></kwd-group></article-meta></front>
<body>
<sec sec-type="intro">
<label>1.</label>
<title>Introduction</title>
<p>Technological advances, decreasing production costs and increasing capabilities have made sensor networks suitable for many application fields such as various scientific and commercial applications including warehouse management, battlefield surveillance and environmental monitoring [<xref ref-type="bibr" rid="b1-sensors-11-01682">1</xref>–<xref ref-type="bibr" rid="b4-sensors-11-01682">4</xref>]. Thanks to the advanced technology, over time sensor networks have been adopted in an all-around industry. Gathering data to be aware of any states by using those sensors is achieved by modeling it as a distributed database where sensor readings are collected and processed using queries [<xref ref-type="bibr" rid="b5-sensors-11-01682">5</xref>–<xref ref-type="bibr" rid="b7-sensors-11-01682">7</xref>].</p>
<p>Especially, sensor node components of sensor networks obtain the state information from sensor device parts on those nodes and store those data. Accordingly, each sensor node in a sensor network is regarded as a distributed database system generating a data stream and has been studied as a sensor database [<xref ref-type="bibr" rid="b7-sensors-11-01682">7</xref>]. In query processing of sensor networks, the join operation costs much in sensor networks for correlating sensor readings like distributed database environments [<xref ref-type="bibr" rid="b8-sensors-11-01682">8</xref>]. Therefore, many researchers have studied reducing the cost in sensor environment [<xref ref-type="bibr" rid="b1-sensors-11-01682">1</xref>].</p>
<p>In a sensor network environment, a query is issued for retrieving and gathering the real time state information. The form of a well-used query in a sensor network is using an SQL-like declarative language [<xref ref-type="bibr" rid="b8-sensors-11-01682">8</xref>]. The collected data in a sensor network can be seen as one distributed relation over the sensor nodes, called the sensor relation. The query operations are also served restrictively because of the limitation of the environments. Further, most previous solutions either assume that nodes have sufficient memory to buffer the partition of the join relations assigned to them for processing, or that the amount of memory available at each node is known in advance and the assigned data partitions can be set accordingly [<xref ref-type="bibr" rid="b1-sensors-11-01682">1</xref>]. Under these assumptions, it is hard to apply assumptions to real life.</p>
<p>Therefore, we consider the communication cost aspect and we propose an Incremental Join Algorithm (IJA) as an in-network join strategy which is an efficient join processing in sensor networks and minimizes communication cost. The IJA strategy is capable of reducing communication cost and utilizing data by gathering real-time state information from sensors which is one of sensor network features. In sensor network environments, it is hard to send all data stored at each node to server located in the center as we consider in the assumptions. Therefore it needs to be filtered whether data are sent or not. The problem in the previous studies is that the results processed in the previous steps would be ignored as a query happens. However, as a different point between the earlier studies and this paper, this algorithm for processing query uses the previous result and just sends the operations needed to be joined to the final node. Therefore assume previous join results are stored in temporary repository to process efficiently. To evaluate the performance, we compare the IJA strategy to the conventional algorithms including synopsis strategy. The remainder of the paper is structured as follows. In the next section, we describe typical join strategies in sensor networks including synopsis to compare. In Section 3, we introduce and explain an incremental join algorithm. And we analyze the performance and compare IJA to typical join strategy including synopsis algorithm. In Section 5, we conclude with future works.</p></sec>
<sec>
<label>2.</label>
<title>Related Works</title>
<p>In sensor network, sensor nodes are formed by hundreds and hundreds of fixed nodes. Consequently, the value obtained periodically from sensor nodes is lack of expressing all the information about event or entity. It needs a join operation for that problem. The data stored at each sensor node forms a kind of table over all nodes, denoted R. To process a join query, we first have to decide which join queries are used. In this paper, we consider binary equi-join (BEJ). A BEJ query for sensor networks is defined as follows:</p>
<p><italic>Definition 1</italic>
<disp-formula>
<mml:math display="block">
<mml:mrow>
<mml:mtable columnalign="center">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mtext>Given two sensor tables </mml:mtext>
<mml:mi mathvariant="bold-italic">R</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi></mml:mrow>
<mml:mn mathvariant="italic">1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi></mml:mrow>
<mml:mn mathvariant="italic">2</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mn>...</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi></mml:mrow>
<mml:mi>n</mml:mi></mml:msub></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo> </mml:mo>
<mml:mtext>and </mml:mtext>
<mml:mi>S</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mn mathvariant="italic">1</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi></mml:mrow>
<mml:mn mathvariant="italic">2</mml:mn></mml:msub>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mn>...</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:msub>
<mml:mi>B</mml:mi>
<mml:mi>n</mml:mi></mml:msub></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mrow>
<mml:mtext>a binary equi-join (BEJ)</mml:mtext></mml:mrow>
<mml:mo> </mml:mo>
<mml:mtext>is</mml:mtext></mml:mrow></mml:mtd></mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:mi mathvariant="bold">R</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mo>∞</mml:mo></mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mtext>Ai</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mtext>Bj</mml:mtext></mml:mrow></mml:mrow></mml:msub>
<mml:mi mathvariant="bold">S</mml:mi>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mtext>i </mml:mtext>
<mml:mo>∈</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mn>....</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mi mathvariant="normal">n</mml:mi></mml:mrow>
<mml:mo>}</mml:mo></mml:mrow>
<mml:mo>,</mml:mo>
<mml:mtext>j </mml:mtext>
<mml:mo>∈</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mn>....</mml:mn>
<mml:mo>,</mml:mo>
<mml:mo> </mml:mo>
<mml:mi mathvariant="normal">m</mml:mi></mml:mrow>
<mml:mo>}</mml:mo></mml:mrow></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>where A<sub>i</sub> and B<sub>j</sub> are two attributes of R and S respectively, which have the same domain.</p>
<p>Consider a sensor network covering a road network from [<xref ref-type="bibr" rid="b7-sensors-11-01682">7</xref>]. Each sensor node can detect the IDs of vehicles in its close vicinity, and record the timestamps at which the vehicles are detected. Suppose N<sub>R</sub> and N<sub>S</sub> represent two sets of sensor nodes located at two regions of a road segment, Region1 and Region2, respectively. To gather the necessary data for determining the speeds of vehicles traveling between the two regions, the following join query can be expressed:
<list list-type="simple">
<list-item>
<p>SELECT R.autoID, R.time, S.time</p></list-item>
<list-item>
<p>FROM R, S</p></list-item>
<list-item>
<p>WHERE R.location IN Region1 AND S.location IN Region2 AND R.autoID = S.autoID</p></list-item></list></p>
<p>To evaluate the above query, sensor readings from Region1 and Region2 need to be collected and joined on the autoID attribute. Typical join strategies of sensor networks are classified into Naïve join, Sequential join and Centroid join according to the join location and shown in <xref ref-type="fig" rid="f1-sensors-11-01682">Figure 1</xref> [<xref ref-type="bibr" rid="b9-sensors-11-01682">9</xref>].</p>
<p>One of general join problems is a heavy communication cost to be transferred into among nodes and a lower join selectivity of query regarding overhead in communication. For instance, given there are two tables, R and S, pairs which is not participated in joining operation between R and S tables can be sent to another region F to join with another table. In the Naïve join algorithm in <xref ref-type="fig" rid="f1-sensors-11-01682">Figure 1(a)</xref>, sensor nodes around the sink node in region F are join nodes N<sub>F</sub> selected. Although the cost of routing join results to the sink node can be minimized, the each whole table in region R and S is routed to the sink.</p>
<p>In the Sequential join algorithm in <xref ref-type="fig" rid="f1-sensors-11-01682">Figure 1(b)</xref>, it is minimized by routing the join results to the sink after performing the local join R<sub>i</sub> ∞ S where R<sub>i</sub> is the local table stored at node n<sub>i</sub> in region R and S is the table in region S. In this algorithm, the problem is also that the whole table in region S is delivered to the nodes in region R. That makes communication cost be high.</p>
<p>As compared with the earlier algorithms, the Centroid algorithm in <xref ref-type="fig" rid="f1-sensors-11-01682">Figure 1(c)</xref> could also deliver the tables in each region into region F. In spite of that delivery, the nodes which are close to each region R and S in distance are selected to be joined and it could minimize the communication cost. However, the tables in each region are also needed to be routed into the join region F.</p>
<p>To solve or reduce the problems above, the synopsis strategy join was suggested. After reducing the number of pairs in R and S tables using synopsis to remove the rest pairs not to participate join operation, SNJ sends the pairs to join with others. The means of synopsis is an abstract of a table to process join operation. In addition, the size of the synopsis table is smaller than original table size. Therefore, each sensor creates its synopsis [<xref ref-type="bibr" rid="b9-sensors-11-01682">9</xref>]. Synopsis strategy consists of 3 steps. First is synopsis join step. The second step is notification and third is final join operation.</p>
<p>The first join operation of synopsis is as follows: each node, n<sub>i</sub> ∈ N<sub>R</sub>, stores the local table R<sub>i</sub> which is one of local tables consisting of table R. Also each node n<sub>i</sub> creates local synopsis S<sub>i</sub> (R<sub>i</sub>) by extracting join attributes A<sub>j</sub>, and counting the frequency of the same value in the table. Synopsis join region N<sub>L</sub> is selected to get a final join candidate pairs from joining table R and S synopsis. The synopsis join nodes after receiving synopsis from N<sub>R</sub> and N<sub>S</sub> synopsis process synopsis join operations.</p>
<p>The second step, a notification of synopsis join strategy, notifies final join candidate pairs to the N<sub>R</sub> and N<sub>S</sub> nodes. For this, synopsis join node n1 stores sensor ID of local synopsis originated. In the third step, each node of N<sub>R</sub> or N<sub>S</sub> notified from synopsis join nodes n<sub>1</sub> sends join attribute v to final join node n<sub>f</sub>. The final join node n<sub>f</sub> joins with R<sub>v</sub> ∞ S<sub>v</sub>, and then sends the results to query sink node.</p>
<p>However, although the synopsis algorithm has contributed to reduce the communication cost, the algorithm only excludes duplicated data and a lot of data could be delivered as far as the table has various attribute values not to be duplicated. Therefore, we need to process as a unit of query and suggest an incremental join algorithm.</p></sec>
<sec>
<label>3.</label>
<title>IJA: Incremental Join Algorithm</title>
<p>In this section, we propose an incremental join algorithm (IJA) to gather and process real-time state information which is one of the sensor network features. First, we describe general environment components including terms in next section [<xref ref-type="bibr" rid="b7-sensors-11-01682">7</xref>,<xref ref-type="bibr" rid="b11-sensors-11-01682">11</xref>] and the algorithm later.</p>
<sec>
<label>3.1.</label>
<title>General Environment</title>
<p>Suppose a sensor network consisting of <italic>N</italic> sensor nodes. We assume there are two virtual tables in the sensor network, <italic>R</italic> and <italic>S</italic>, containing sensor readings distributed in sensors. Each sensor reading is a pair with two mandatory attributes, timestamp and sensorID, indicating the time and the sensor at which the pair is generated. A sensor reading may contain other attributes that are measurements generated by a sensor or multiple sensors, e.g., temperature, autoID. We are interested in the evaluation of static one-shot binary equi-join queries in sensor networks. We assume that <italic>R</italic> and <italic>S</italic> are stored in two sets of sensor nodes <italic>N<sub>R</sub></italic> and <italic>N<sub>S</sub></italic> located in to distinct regions known as R and S respectively. A BEJ query can be issued from any sensor node called query sink, which is responsible for collecting the join result. A set of nodes is required to process the join collaboratively, referred to as join nodes.</p>
<p>When a join query is issued, a join node selection process is initiated to find a set of join nodes <italic>N<sub>F</sub></italic> to perform the join. <italic>R</italic> pairs are routed to a join region <italic>F</italic> where the join nodes <italic>N<sub>F</sub></italic> reside in. Each join node <italic>n<sub>f</sub></italic> ∈<italic>N<sub>F</sub></italic> stores a horizontal partition of the table <italic>R</italic>, denoted as <italic>R<sub>f</sub></italic>. <italic>S</italic> pairs are transmitted to and broadcast in <italic>F</italic>. Each join node <italic>n<sub>f</sub></italic> receives a copy of <italic>S</italic> and processes local join <italic>R<sub>f</sub></italic> ∞ <italic>S</italic>. The query sink obtains the join results by collecting the partial join results at each <italic>n<sub>f</sub></italic>.</p>
<p>The selection of <italic>N<sub>F</sub></italic> is critical to the join performance. Join node selection involves selecting the number of nodes in <italic>N<sub>F</sub></italic>, denoted by |<italic>N<sub>F</sub></italic>|, and the location of the join region <italic>F</italic>. To avoid memory overflow, assuming <italic>R</italic> is evenly distributed in <italic>N<sub>F</sub></italic>, |<italic>N<sub>F</sub></italic>| should be at least |<italic>R</italic>|/<italic>m</italic>, where |<italic>R</italic>| denotes the number of pairs in <italic>R</italic> and <italic>m</italic> denotes the maximum number of <italic>R</italic> pairs a join node <italic>n<sub>f</sub></italic> can store. For the rest, it carries out the experiments in the same condition with previous researches.</p></sec>
<sec>
<label>3.2.</label>
<title>Incremental Join Algorithm Strategy</title>
<p><xref ref-type="fig" rid="f2-sensors-11-01682">Figure 2</xref> shows the flow of IJA. The sink node in IJA is a node happened query and responsible for gathering the query results from each region similar to previous researches. The different point compared to the previous researches is that the result to be routed with each region is a kind of join results just participated in the query processing. For this, suppose that nodes in each region could know information of tables to be participating in join operations from query. Each node performs the local join operation based on the query information. For instance, the table <italic>R</italic> information in a query would join within <italic>S</italic> table existed in region S. In case of table <italic>S</italic>, it is relatively in the opposite direction. Therefore, we can get the information to perform local joins at each region through a query. If there is no longer local join operation in a region from query, then no further operation is needed to rout and deliver to the final node. In addition, owing to processing a unit of query, the communication cost could be remarkably reduced.</p>
<p>The steps for IJA are the following:
<list list-type="order">
<list-item>
<p>Send an event pair of <italic>R</italic>(or <italic>S</italic>) to other part. Only send the pair to <italic>R</italic>egio<italic>n</italic> S to make a semi table P<sub>R</sub> (or P<sub>S</sub>) at the counterpart.</p></list-item>
<list-item>
<p>Perform join operation at each region to produce a semi table. Send the semi table to region F in case joining results exist.</p></list-item>
<list-item>
<p>Perform join operation with semi tables from R and S respectively. Send the join results to the query sink.</p></list-item>
<list-item>
<p>At the query sink node, the query can get the result within region F not to compute all of R and S computations.</p></list-item></list></p>
<p>The incremental join algorithm’s objective is to produce a smaller semi table by processing the join operation in the counter region because it needs to be decided whether the event pair is useful to join at the region or not. The rest of operations to process at the regions such as selecting a center location of the regions, routing protocol etc, is based on [<xref ref-type="bibr" rid="b7-sensors-11-01682">7</xref>,<xref ref-type="bibr" rid="b13-sensors-11-01682">13</xref>]. The number of the join nodes at semi table join region is decided by P<sub>R</sub> and P<sub>S</sub> arrived at N<sub>H</sub> region. Therefore given memory m for a node, the node number at semi table region is as follows:
<disp-formula>
<mml:math display="block">
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">N</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">P</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">R</mml:mi></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">P</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">S</mml:mi></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:math></disp-formula></p>
<p>The different point with other algorithms is just sending the pair whenever a pair is occurred with insert or update operation. To compute the communication cost, |N<sub>F</sub>| is as follows:
<disp-formula>
<mml:math display="block">
<mml:mrow>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">N</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">F</mml:mi></mml:msub></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">R</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">i</mml:mi></mml:msub></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow>
<mml:mo>+</mml:mo>
<mml:mo>∑</mml:mo>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:mi mathvariant="normal">C</mml:mi>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">S</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">j</mml:mi></mml:msub></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow></mml:mrow>
<mml:mo>|</mml:mo></mml:mrow></mml:mrow>
<mml:mo>)</mml:mo></mml:mrow>
<mml:mo>/</mml:mo>
<mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:math></disp-formula>
<disp-formula>
<mml:math display="block">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">n</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">i</mml:mi></mml:msub>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">N</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">R</mml:mi></mml:msub>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:mo> </mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">n</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">j</mml:mi></mml:msub>
<mml:mo>∈</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi mathvariant="normal">N</mml:mi></mml:mrow>
<mml:mi mathvariant="normal">S</mml:mi></mml:msub></mml:mrow></mml:math></disp-formula>where |C(R<sub>i</sub>)| is the number of join candidate pairs arrived from node n<sub>i</sub> ∈ N<sub>R</sub>. |C(S<sub>j</sub>)| is the number of join candidate pairs arrived from node n<sub>j</sub> ∈ N<sub>S</sub>.</p></sec></sec>
<sec>
<label>4.</label>
<title>Performance Evaluation of IJA</title>
<p>In this chapter, we evaluate the performance of IJA and compare it with typical join strategies such as naïve join, sequential join and centroid join including synopsis strategy. The experiment is mainly measured by the total number of messages incurred for each join strategy because join processing in sensor network is a complex operation due to the distributed nature of the processing and the limited memory at nodes. Other comparisons for performance evaluation and experiments will be included in our future work.</p>
<sec>
<label>4.1.</label>
<title>Experiment Environments</title>
<p>The join operation in large scale sensor networks must be processed in a distributed manner. So a single node cannot buffer all the data needed to be joined for most queries. Therefore, for experiments in this work, we performed the same simulation experiments as the synopsis strategy [<xref ref-type="bibr" rid="b7-sensors-11-01682">7</xref>] done for comparing naïve join, sequential join, centroid join and also including synopsis strategy. In case of the number of sensor node, this experiment has done with 10,000 sensor nodes uniformly placed in a 100 × 100 grid. Each grid contains one sensor node located at the center of the grid. The regions R and S are located at the bottom-right and bottom-left corners of the network region, respectively, each covering 870 sensor nodes. Table <italic>R</italic> in region R consists of 2,000 pairs, while <italic>S</italic> in region S consists of 1,000 pairs. They are uniformly distributed in regions R and S. For communication cost, we set a message size of 40 bytes, which is equal to the size of a data pair. A pair in the join result is 80 bytes since it is a concatenation of two data pairs. The messages for synchronization and coordination among the sensors are negligible compared to the data traffic for communication caused by large tables. Further, for simplifying analysis, we assume that no failure for sending and receiving messages among nodes.</p></sec>
<sec>
<label>4.2.</label>
<title>Performance Evaluation</title>
<p>We first varied the join selectivity and the synopsis selectivity for synopsis strategy. Join selectivity δ is defined as |<italic>R</italic> ∞ <italic>S</italic>| / (|<italic>R</italic>| × |<italic>S</italic>|). The join attribute values are uniformly distributed within the domain of the attribute.</p>
<p><xref ref-type="fig" rid="f3-sensors-11-01682">Figure 3</xref> shows the total communication cost for different join selectivities while keeping the memory capacity and synopsis size fixed at 250 × 40 bytes and 10 bytes respectively. As shown in the Figure, naïve join performs worse than all others due to the high cost of routing <italic>S</italic> in region S to all nodes in N<sub>R</sub>. In addition, sequential join performs worse than centroid join and synopsis as well. Therefore we exclude them from when join selectivity is greater than 0.01. Synopsis strategy is lower than others and outperforms because non-candidate pairs can be determined in the synopsis join state, and only a small portion of data are transmitted during the final join. However, IJA performs than all algorithms though not to be shown in the <xref ref-type="fig" rid="f3-sensors-11-01682">Figure 3</xref>. Therefore <xref ref-type="fig" rid="f4-sensors-11-01682">Figure 4(a)</xref> shows the comparison synopsis algorithm to incremental join algorithm.</p>
<p><xref ref-type="fig" rid="f4-sensors-11-01682">Figure 4(b)</xref> shows the enlargement of lower selectivity than 0.02 in the axis of <xref ref-type="fig" rid="f4-sensors-11-01682">Figure 4(a)</xref>. For that case, synopsis has lower communication cost than IJA. This is because synopsis strategy has both join selectivity and synopsis selectivity parameters which have a strong effect on communication cost. For the experiment, the rate for synopsis in this case is fixed with 0.01. In case of synopsis rate variation, it is shown in <xref ref-type="fig" rid="f5-sensors-11-01682">Figure 5</xref>.</p>
<p>We can see the result that the more replicated data are existed, the more the synopsis algorithm is efficient. In spite of that fact, the IJA is more efficient than synopsis methodology under 60% of replicated data. In the environment for above 60% of replicated data, it is unrealistic case. Therefore, the suggested IJA is appropriate for the algorithm in sensor network environment to integrate data.</p></sec></sec>
<sec sec-type="conclusions">
<label>4.</label>
<title>Conclusions</title>
<p>Sensor networks have been adopted in various scientific and commercial applications. Gathering data from sensors is achieved by modeling it as a distributed database where sensor readings are collected and processed using queries. Sensor nodes are generally highly constrained, in particular regarding their energy and memory resources. While simple queries such as SELECT and AGGREGATE queries in wireless sensor networks have been addressed in the literature, the processing of join queries in sensor networks remain to be investigated. Previous approaches have either assumed that the join processing nodes have sufficient memory to buffer the subset of the join relations assigned to them, or that the amount of available memory at nodes is known in advance.</p>
<p>Therefore, in this paper including these assumptions, we describe an Incremental Join Algorithm (IJA) in Sensor Networks to reduce the overhead caused by moving a join pair to the final join node or minimize the communication cost that is the main consumer of the battery when processing the distributed queries in sensor network environments. To evaluate the experiments, we compare the IJA with the typical algorithms. including the synopsis algorithm which is a representative strategy in sensor network to process queries. We also show the result of comparisons. In case of under join selectivity 0.01, typical join algorithms, such as naïve, sequence and centroid join, perform worse than synopsis and IJA algorithms.</p>
<p>Despite having better synopsis performance compared to a typical join algorithms, IJA performs better than the synopsis algorithm in conditions with above 60% synopsis. As future work, we will vary and study the parameters such as network density, node memory capacity and synopsis size, including communication cost.</p></sec></body>
<back>
<ack>
<p>This paper was supported by Wonkwang University in 2010.</p></ack>
<ref-list>
<title>References and Notes</title>
<ref id="b1-sensors-11-01682"><label>1.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Coman</surname><given-names>A.</given-names></name><name><surname>Nascimento</surname><given-names>M.A.</given-names></name></person-group><article-title>A distributed Algorithm for Joins in Sensor Networks</article-title><conf-name>Proceedings of International Conference on SSDBM</conf-name><conf-loc>Banff, AB, Canada</conf-loc><conf-date>July 19, 2007.</conf-date></citation></ref>
<ref id="b2-sensors-11-01682"><label>2.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Mainaring</surname><given-names>A.</given-names></name><name><surname>Culler</surname><given-names>D.</given-names></name><name><surname>Plastre</surname><given-names>J.</given-names></name><name><surname>Szewczyk</surname><given-names>R.</given-names></name><name><surname>Anderson</surname><given-names>J.</given-names></name></person-group><article-title>Wireless Sensor Networks for Habitat Monitoring</article-title><conf-name>Proceedings of WSNA ’02</conf-name><conf-loc>Atlanta, GA, USA</conf-loc><conf-date>September 28, 2002</conf-date></citation></ref>
<ref id="b3-sensors-11-01682"><label>3.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Estrin</surname><given-names>D.</given-names></name><name><surname>Govindan</surname><given-names>R.</given-names></name><name><surname>Heidemann</surname><given-names>J.S.</given-names></name><name><surname>Kumar</surname><given-names>S.</given-names></name></person-group><article-title>Next Century Challenges: Scalable Coordination in Sensor Networks</article-title><conf-name>Proceedings of MobiCom</conf-name><conf-loc>Seattle, WA, USA</conf-loc><conf-date>August 1999</conf-date></citation></ref>
<ref id="b4-sensors-11-01682"><label>4.</label><citation citation-type="book"><person-group person-group-type="editor"><name><surname>Estrin</surname><given-names>D.</given-names></name><name><surname>Govindan</surname><given-names>R.</given-names></name><name><surname>Heidemann</surname><given-names>J.S.</given-names></name></person-group><article-title>Embedding the internet: Introduction</article-title><source>Commun. ACM</source><year>2000</year><volume>43</volume><fpage>75</fpage><lpage>82</lpage></citation></ref>
<ref id="b5-sensors-11-01682"><label>5.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Bonnet</surname><given-names>P.</given-names></name><name><surname>Gehrke</surname><given-names>J.</given-names></name><name><surname>Seshadri</surname><given-names>P.</given-names></name></person-group><article-title>Towards Sensor Database Systems</article-title><conf-name>Proceedings of International Conference on Mobile Data Management, MDM 2001 LNCS</conf-name><conf-loc>Hong Kong, China</conf-loc><conf-date>January 8–10, 2001</conf-date><comment>Volume 1987</comment><fpage>3</fpage><lpage>14</lpage></citation></ref>
<ref id="b6-sensors-11-01682"><label>6.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Madden</surname><given-names>S.</given-names></name><name><surname>Franklin</surname><given-names>M.J.</given-names></name><name><surname>Hellerstein</surname><given-names>J.M.</given-names></name><name><surname>Hong</surname><given-names>W.</given-names></name></person-group><article-title>TAG: A Tiny AGregation Service for <italic>ad-hoc</italic> Sensor Networks</article-title><conf-name>Proceedings of OSDI’02</conf-name><conf-loc>Boston, MA, USA</conf-loc><conf-date>December 9–11, 2002</conf-date></citation></ref>
<ref id="b7-sensors-11-01682"><label>7.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Yu</surname><given-names>H.</given-names></name><name><surname>Lim</surname><given-names>E.</given-names></name><name><surname>Zhang</surname><given-names>J.</given-names></name></person-group><article-title>In-network Join Processing for Sensor Networks</article-title><conf-name>Proceedings 8th Asia-Pacific Web Conference, Frontiers of WWW Research and Development—APWeb 2006</conf-name><conf-loc>Harbin, China</conf-loc><conf-date>January 16–18, 2006</conf-date><fpage>263</fpage><lpage>274</lpage></citation></ref>
<ref id="b8-sensors-11-01682"><label>8.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gehrke</surname><given-names>J.</given-names></name><name><surname>Madden</surname><given-names>S.</given-names></name></person-group><article-title>Query processing in sensor networks</article-title><source>Pervasive Comput</source><year>2004</year><volume>3</volume><fpage>46</fpage><lpage>55</lpage><pub-id pub-id-type="doi">10.1109/MPRV.2004.1269131</pub-id></citation></ref>
<ref id="b9-sensors-11-01682"><label>9.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Coman</surname><given-names>A.</given-names></name><name><surname>Nascimento</surname><given-names>M.</given-names></name><name><surname>Sander</surname><given-names>J.</given-names></name></person-group><article-title>On Join Location in Sensor Networks</article-title><conf-name>Proceedings of MDM</conf-name><conf-loc>Mannheim, Germany</conf-loc><conf-date>May 1, 2007</conf-date><fpage>190</fpage><lpage>197</lpage></citation></ref>
<ref id="b10-sensors-11-01682"><label>10.</label><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname><given-names>Y.</given-names></name><name><surname>Gehrke</surname><given-names>J.E.</given-names></name></person-group><article-title>The cougar approach to in-network query processing in sensor networks</article-title><source>SIGMOD Record</source><year>2002</year><volume>31</volume><fpage>9</fpage><lpage>18</lpage><pub-id pub-id-type="doi">10.1145/601858.601861</pub-id></citation></ref>
<ref id="b11-sensors-11-01682"><label>11.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Chowdhary</surname><given-names>V.</given-names></name><name><surname>Gupta</surname><given-names>H.</given-names></name></person-group><article-title>Communication-Efficient Implementation of Join in Sensor Network</article-title><conf-name>Proceedings 10th International Conference Database Systems for Advanced Applications, DASFAA 2005</conf-name><conf-loc>Beijing, China</conf-loc><conf-date>April 17–20, 2005</conf-date><fpage>447</fpage><lpage>460</lpage></citation></ref>
<ref id="b12-sensors-11-01682"><label>12.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Yao</surname><given-names>Y</given-names></name><name><surname>Gehrke</surname><given-names>J.</given-names></name></person-group><article-title>Query Processing for Sensor Networks</article-title><conf-name>Proceedings International Conference on Innovative Data System Research, IEEE Pervasive Computing 2003</conf-name><conf-loc>Monterey, CA, USA</conf-loc><comment>Volume 3</comment><fpage>46</fpage><lpage>55</lpage></citation></ref>
<ref id="b13-sensors-11-01682"><label>13.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Karp</surname><given-names>B</given-names></name><name><surname>Kung</surname><given-names>M.J.</given-names></name></person-group><article-title>GPSR: Greedy Perimeter Statelss Routing for Wireless Networks</article-title><conf-name>Proceedings of MobiComm</conf-name><conf-loc>Boston, MA, USA</conf-loc><conf-date>August 2000</conf-date></citation></ref>
<ref id="b14-sensors-11-01682"><label>14.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>J.Z.</given-names></name></person-group><article-title>An Energy-Efficient Query Processing Algorithm for Wireless Sensor Networks</article-title><conf-name>Proceedings of 5th International Conference Ubiquitous Intelligence and Computing, UIC 2008</conf-name><conf-loc>Oslo, Norway</conf-loc><conf-date>June 23−25, 2008</conf-date><fpage>373</fpage><lpage>385</lpage></citation></ref>
<ref id="b15-sensors-11-01682"><label>15.</label><citation citation-type="confproc"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Z.</given-names></name><name><surname>Gao</surname><given-names>X.F.</given-names></name><name><surname>Zhang</surname><given-names>X.F.</given-names></name><name><surname>Wu</surname><given-names>W.L.</given-names></name><name><surname>Xiong</surname><given-names>H.</given-names></name></person-group><article-title>Three Approximation Algorithms for Energy-Efficient Query Dissemination in Sensor Database System</article-title><conf-name>Proceedings of 20th International Conference Database and Expert Systems Applications, DEXA 2009</conf-name><conf-loc>Linz, Austria</conf-loc><conf-date>August 31−September 4, 2009</conf-date><fpage>807</fpage><lpage>821</lpage></citation></ref></ref-list>
<sec sec-type="display-objects">
<title>Figures</title>
<fig id="f1-sensors-11-01682" position="float">
<label>Figure 1.</label>
<caption>
<p>General join strategies. <bold>(a)</bold> Naïve join. <bold>(b)</bold> Sequential join. <bold>(c)</bold> Centroid join.</p></caption>
<graphic xlink:href="sensors-11-01682f1a.gif"/>
<graphic xlink:href="sensors-11-01682f1b.gif"/></fig>
<fig id="f2-sensors-11-01682" position="float">
<label>Figure 2.</label>
<caption>
<p>Incremental join algorithm.</p></caption>
<graphic xlink:href="sensors-11-01682f2.gif"/></fig>
<fig id="f3-sensors-11-01682" position="float">
<label>Figure 3.</label>
<caption>
<p>Impact of selectivity.</p></caption>
<graphic xlink:href="sensors-11-01682f3.gif"/></fig>
<fig id="f4-sensors-11-01682" position="float">
<label>Figure 4.</label>
<caption>
<p>Comparison IJA to synopsis algorithm. <bold>(a)</bold> Comparison synopsis with IJA. <bold>(b)</bold> Magnification with join selectivity &lt;= 0.02.</p></caption>
<graphic xlink:href="sensors-11-01682f4.gif"/></fig>
<fig id="f5-sensors-11-01682" position="float">
<label>Figure 5.</label>
<caption>
<p>Cases for varied synopsis rates. <bold>(a)</bold> Communication cost as changing synopsis rates from 0% to 100%. <bold>(b)</bold> Comparison IJA with synopsis algorithm.</p></caption>
<graphic xlink:href="sensors-11-01682f5.gif"/></fig></sec></back></article>
