Clustering is the process of assigning a set of sensor nodes, with similar attributes, to a specified group or cluster. In our research, we have proposed a new energy efficient clustering algorithm that operates in two phases: preliminary and final clustering phase. In preliminary phase, sensor nodes sensing the same category of data are placed in a distinct cluster. In final phase, the remaining unclustered sensors estimate their divergence with respect to the clustered neighbors and ultimately join the least-divergent cluster.

### Preliminary clustering phase

The formation of preliminary clusters is purely distributed and is based on the sensed data. The proposed clustering method is independent of predetermination of number of clusters, geographic positioning and distance measures. We have used a window function [24] to normalize the sensed data so as to scale the value within the range [0…1]. Let us assume, *a* and *b* be the minimum and maximum value of the environmental parameter to be monitored and *x*_{
avg
}(*t*) be the average of the set of data sensed for the time interval *t*. The window function *ϕ*( • ) can be defined as follows:

\phi \phantom{\rule{0.25em}{0ex}}\left({x}_{\mathit{avg}}\left(t\right),a,b\right)\phantom{\rule{0.5em}{0ex}}=\phantom{\rule{1em}{0ex}}\left\{\begin{array}{l}1\phantom{\rule{2em}{0ex}}\left(\frac{{x}_{\mathit{avg}}\left(t\right)}{b-a}\right)\phantom{\rule{0.5em}{0ex}}\in \phantom{\rule{0.5em}{0ex}}\left[0,0.2\right[\hfill \\ 2\phantom{\rule{2em}{0ex}}\left(\frac{{x}_{\mathit{avg}}\left(t\right)}{b-a}\right)\phantom{\rule{0.5em}{0ex}}\in \phantom{\rule{0.5em}{0ex}}\left[0.2,0.4\right[\hfill \\ 3\phantom{\rule{2em}{0ex}}\left(\frac{{x}_{\mathit{avg}}\left(t\right)}{b-a}\right)\phantom{\rule{0.5em}{0ex}}\in \phantom{\rule{0.5em}{0ex}}\left[0.4,0.6\right[\hfill \\ 4\phantom{\rule{2em}{0ex}}\left(\frac{{x}_{\mathit{avg}}\left(t\right)}{b-a}\right)\phantom{\rule{0.5em}{0ex}}\in \phantom{\rule{0.5em}{0ex}}\left[0.6,0.8\right[\hfill \\ 5\phantom{\rule{2em}{0ex}}\left(\frac{{x}_{\mathit{avg}}\left(t\right)}{b-a}\right)\phantom{\rule{0.5em}{0ex}}\in \phantom{\rule{0.5em}{0ex}}\left[0.8,1.0\right]\hfill \\ 0\phantom{\rule{2.12em}{0ex}}\mathit{otherwise}\hfill \end{array}\right.

(1)

The sensors use the window function to map the data into one of the formats. All the nodes that sense the same format in 1-hop distance groups together to form a preliminary cluster. In the initial phase, the node with maximum energy within the preliminary cluster is appointed as the cluster head. It maintains a duration timer to keep track of the period for which it remained cluster head. Once appointed the node functions as cluster head till its duration timer expires. On the expiration of the timer, the role of cluster head rotates to other probable nodes whose residual energy qualifies above a minimum predefined energy threshold. The head rotation performs load balancing within the clusters. Moreover, the cluster head assigns a unique cluster id to all the cluster members.

Though the idea of preliminary stage of cluster formation is simple to implement but due to some situations (boundary value or out-of-bound data sensing) few nodes in the network might still remain unclustered. This problem is solved by our final clustering phase.

### Final clustering phase

The final clustering phase ensures that all the nodes in the sensor network get clustered. The process begins with an unclustered node discovering one or more clustered neighbor in its direct hop. The node then obtains the array of probabilities of the sensed data from its neighbors that are distinctly clustered. This procedure is further elaborated in the following section.

Each sensor node maintains the following information in its database, which eventually helps in calculating the divergence measure required for final clustering.

{\mathrm{\Delta}}_{n}^{s}=\left\{{P}^{s}=\left({p}_{1}^{s},{p}_{2}^{s},{p}_{1}^{s},\dots ,\phantom{\rule{0.5em}{0ex}}{p}_{n}^{s}\right),\phantom{\rule{0.5em}{0ex}}{p}_{i}^{s}\ge 0,{{\displaystyle \sum}}_{i=1}^{n}\phantom{\rule{0.25em}{0ex}}{p}_{i}^{s}=1\right\}

(2)

where *p*_{
i
}^{s} is the probability of *i*^{th} data format from the sensor s and the probability sequence is denoted by *P*^{s}.

#### Selection of divergence method

We know that the entropy of the source can be given by the Shannon’s entropy *H*(*P*):

H\left(P\right)=-{{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln{p}_{i}

(3)

where *p*_{
i
} ∈ *P*^{s} and *P* is Host or Local Probability Model (*LPM*) of host sensor node. Moreover, the inaccuracy in data is given by:

H\left(P\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}T\right)=-{{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln{t}_{i}

(4)

Where *t*_{
i
} ∈ T^{s} and *T* is Remote Probability Model (RPM) of remote sensor node. On subtracting equation (4) from (3), we get Kullback–Leibler directed divergence measure [25]:

D\left(P\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}T\right)=H\left(P\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}T\right)-H\left(P\right)=-{{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln{t}_{i}+{{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln{p}_{i}={{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln\frac{{p}_{i}}{{t}_{i}}

(5)

However, the divergence *D*(*P* || *T*) is not a symmetric measure, i.e. *D*(*P* || *T*) ≠ *D*(*T* || *P*) and hence it cannot be directly applied. Therefore, we consider the symmetric version of Kullback–Leibler, known as Jeffrey’s (*J*) divergence measure [26] which can be derived as following:

\begin{array}{l}J\left(P\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}T\right)=D\left(P\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}T\right)+D\left(T\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}P\right)={{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln\frac{{p}_{i}}{{t}_{i}}+{{\displaystyle \sum}}_{i=1}^{n}{t}_{i}ln\frac{{t}_{i}}{{p}_{i}}\hfill \\ J\left(P\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}T\right)={{\displaystyle \sum}}_{i=1}^{n}{p}_{i}ln\frac{{p}_{i}}{{t}_{i}}-{{\displaystyle \sum}}_{i=1}^{n}{t}_{i}ln\frac{{p}_{i}}{{t}_{i}}={{\displaystyle \sum}}_{i=1}^{n}\left({p}_{i}-{t}_{i}\right){p}_{i}ln\frac{{p}_{i}}{{t}_{i}}\hfill \end{array}

(6)

#### Application of divergence measure

Divergence measure is a metric used for defining the degree of dissimilarity between two objects. In our clustering processes, an unclustered node uses the divergence measure to analyze the extent to which it differs from each of its clustered neighbors and eventually decides to join the cluster that exhibits maximum similarity (minimum divergence). Subsequently, clusters formed by the end of final clustering phase are likely to be highly correlated. For simulation purpose, we have employed Jeffrey’s divergence measure owing to its symmetric nature.

According to our strategy, every unclustered sensor node makes use of the *J* - divergence measure derived in equation (6) to calculate the divergence between itself and every other clustered (neighboring) sensor nodes. The unclustered sensor *s* will join the clustered node \overline{s} such that its divergence is the least as compared to other clustered nodes (equation 7). This process of clustering recursively continues till all nodes in the network are clustered.

\left(\right)close="\}">\begin{array}{c}\hfill J\left({T}^{1}\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}{P}^{s}\right)\hfill \\ \hfill \begin{array}{c}\hfill J\left({T}^{2}\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}{P}^{s}\right)\hfill \\ \hfill \begin{array}{c}\hfill \vdots \hfill \\ \hfill J\left({T}^{z}\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}{P}^{s}\right)\hfill \end{array}\hfill \end{array}\hfill \end{array}\n =\n min\n J\n \n \n \n T\n \n \n s\n \xaf\n \n \n \n \n |\n |\n \n \n P\n s\n \n \n \n \n ,\n \n 1\n \u2264\n \n s\n \xaf\n \n \u2264\n z\n \n

(7)

where J\left({T}^{\overline{s}}\phantom{\rule{0.25em}{0ex}}\left|\right|\phantom{\rule{0.25em}{0ex}}{P}^{s}\right) denote the *J* - divergence measure between the {\overline{s}}^{\mathit{th}} clustered node and *s*^{th} sensor node to be clustered.

#### Exceptional cases

There can be two exceptional cases while executing the final clustering phase. The first case occurs at the beginning of the phase, when no clustered neighbors are found in 1-hop vicinity. This requires the node to wait till it discovers one. The waiting period ends with the expiration of *wait timer* (initialized at the beginning of final clustering phase). The second case is confronted by the end of the final clustering phase when a node discovers itself isolated, i.e. none of its neighbors in 1-hop vicinity are clustered yet. In that case, the node declares itself as cluster head and forms cluster with its 1-hop neighbors. This process continues, till a clustered node is discovered which initiates final clustering with divergence measure. Since, most of the nodes would be clustered (to the least divergent cluster) in the final phase, only fewer nodes would confront such isolation.