Six Degrees - CNT 2013 at CSE, IITKgp

Consensus Clustering

The social networking sites like Facebook, Friendster and MySpace are essentially complex systems induced by friendship links. These networks are a rich source of personal information populated by the users. So one of the things which could be of interest is discovering clusters or communities within this network. Identifying these communities in social networks help us get insight into the system and helps us understand how network topology and function affect each other.

Is there any difference between Community detection and Clustering ??

According to me, there is no difference between them. When so much work has been done already in clustering why is still lot of work being done in community detection ?? Instead cant we use these existing clustering algorithms for detecting communities ??

Why so many work in community detection when clustering algorithms exist ???

We could say that clustering algorithms used generally work very well on data which is dense in nature. But are social networks dense in general ?? Real world data is SPARSE and LARGE SCALE in general. So this becomes a challenge for clustering algorithms to work well on network data and this gives us a need to come up with efficient community detection algorithms.

The main points are summarized in the form of a table shown.

Community Detection	Clustering
Community detection works on large scale matrix	Clustering works on distance or similarity matrix
Deals with sparse matrices in general	Works well on dense data than sparse data
Some of the algorithms are : 1. Minimum- cut method 2. GN algorithm : betweenness based 3. Modularity maximization	Some of the algorithms are : 1. hierarchical clustering 2. k – means algorithm 3. Distribution – based EM algorithm

Let us say we use one of the good community detection algorithms for the network we are analyzing

and we end up with a partition. What is the guarantee that the communities detected are valid and are close to ideal partition of the network. In other words, how can we validate the clustering algorithm ??

So, some of the major issues are

Definition of communities change with context
Choice of initial seeds and breaking ties decide the clusters formed
In the presence of many partitions, how do we know which partition to choose

What is Consensus Clustering and why should we use it ?

It is basically a technique which combines partitions from different set of partitions available for the same network in order to have a single partition which exploits the information of different partitions. The goal of consensus clustering is searching for the median or the consensus partition, i.e the partition which is most similar on average to all the input partitions. In the case of temporal networks we can use consensus clustering to combine the partitions over all the time windows to come up with a representative partitions over the whole time. Consensus clustering generates stable results out of a set of partitions and enhances accuracy when combined with existing clustering algorithms.

If we view communities as subgraphs with a high internal edge density and a comparatively low external edge density, the task of any clustering method would be easier if we managed to further increase the internal edge density of the subgraphs, enhancing their cohesion, and to further decrease the edge density between the subgraphs, enhancing their separation. Ideally, if we could push this process to the extreme, i.e by making the external edge density zero we would end up with a set of disconnected cliques, which every clustering method would be able to identify. Consensus clustering induces this type of transformation and therefore it mitigates the deficiencies of clustering algorithms, leading to more efficient techniques.

Consensus clustering can be applied to both static and dynamic networks where in the static case we have a single snapshot of the network whereas in the dynamic case we have the network split over various time-stamps. In the dynamic communities, the network changes over time by addition and removal of vertices and edges dynamically. Lets see how Consensus based approach was used to obtain graph partitions in both the cases.

[2] deals with a package ConsensusClusterPlus which is tool for unsupervised class discovery. It has been implemented in R. It implements consensus clustering by determining the number and membership of possible clusters within a dataset. The method involves subsampling from a set of items and determines clusters of specified number required.

Consensus for Static Clusters

Calculate the consensus weighted matrix D, which is a matrix based on the co-occurence of vertices in clusters of the input partitions.

Input : network G with n vertices , clustering algorithm A which can be any clustering algorithm, threshold t

Apply A on G np times, so to yield np partitions.
Compute the consensus matrix D, where Dij is the number of partitions in which vertices i and j of G are assigned to the same cluster, divided by np.
All entries of D below a chosen threshold t are set to zero.
Apply A on D np times, so as to yield np partitions.
If the partitions are all equal, stop else go to step 2.

Consensus for Dynamic Clusters

In the case of temporal networks, we represent the dynamics as a succession of screenshots corresponding to overlapping time windows. Let us have m windows of size t for a time ranging from t0 to tm which are separated as [t0 , t0 + t], [t0 + 1,t0 + 1 + t], ... [tm - t,tm]. The idea is to derive the consensus partition from subsets of r consecutive snapshots with chosen value of r. We combine first r snapshots, then 2 to r+1 and so on until we span the interval having last r snapshots.

Here, Dij is obtained by computing the number of times vertices i and j are clustered together divided by the number of partitions corresponding to snapshots including both vertices. This approach was used, as in the dynamic case new vertices may join and old ones may disappear.

Once the consensus partitions for each time stamp have been computed we have a problem relating clusters at different times as it is not trivial if a cluster at time t+1 has evolved from the cluster at time t. A cluster may fragment, and thus there would be many children clusters at time t+1 for the same cluster at time t. In order to assign cluster at time t+1 uniquely from a cluster at time t we compute the Jaccard index between the cluster at time t and every possible cluster at time t+1 and thereby pick the cluster for time t+1 which has the highest Jaccard index. The Jaccard index J(A,B) between two sets A and B is computed as total number of elements common to both A and B divided by the number of elements in the union of A and B.

Example

Let us consider a graph having the partitions as shown in the figure below calculated by some clustering algorithm A. Let us try to compute a partition using the consensus approach.

Step 1 : We have np=4 partitions and we have n = 7 vertices .

Step 2 : We need to compute the Dij values for the matrix D .

D12 = (no. of partitions in which 1 and 2 occur in same cluster) / np

= 4 / 4 =1

Similarly, D13 = 0.75 , D23 = 0.75 , D34 = 0.25 , D45 = 1 , D46 = 0.5 , D47 = 0.5 ,

D56 = 0.5 , D57 = 0.5 , D67 = 1 .

Step 3 : We may choose a threshold say 0.1 and set all Dij below 0.1 to 0. In our example, we have no change to Dij values as all Dij values are more than 0.1.

Step 4 : Now, we apply A on D np times and we observe no change in D values. Hence, we conclude that the appropriate partition is as shown in the figure below which is the consensus graph.

Transormation from the original graph to the consensus graph

Effect of Consensus Clustering on Community Structure

In the above figure we can see how using consensus clustering we get the weighted consensus graph from the four graphs using the approach discussed below. The weights of each edge is proportional to its thickness. We can see that we end up with two communities which have become cliques with heavy weights and the connections between them have become light. This has been achieved even though we have two inaccurate 3-partitions.

Results

To demonstrate the superior performance achieved by integrating consensus clustering using any given clustering method the results were tested on benchmarks with built-in community structure.

The value of the Normalized Mutual Information (NMI) between the partition of the benchmark used and the partition found by the algorithm as a function of the mixing parameter which is a degree of fuzziness of the clusters. Low values of mixing parameter correspond to well-separated clusters which are easy to detect; by increasing mixing parameter communities get more mixed and clustering algorithms find it difficult to distinguish them from each other. Thus, all curves display a decreasing trend. The NMI equals 1 if the two partitions to compare are identical, and approaches 0 if they are very different. In the figure below, the graph consists of 1000 and 5000 vertices. Each point corresponds to an average over 100 different graph realizations. For every realization 150 partitions were produced. The original shows the average NMI between each partition and the benchmark partition.

In the below result,

Consensus refers to Consensus Clustering

Louvain refers to Louvain Clustering

LPM refers to Label propogation method

Clauset refers to greedy modularity optimization method

SA refers to modularity optimization via simulated annealing

References

A. Lancichinetti, S. Fortunato : Consensus clustering in complex networks. CoRR abs/1203.6093 Sci. Rep.(03 2012) (NATURE JOURNAL SCIENTIFIC REPORTS)
Matthew D. Wilkerson, D. Neil Hayes : ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 26(12): 1572-1573 (2010)