Graph clustering methods - Graph Clustering and Graph-based Data Clustering

3. Graph Clustering and Graph-based Data Clustering

3.3 Graph clustering methods

ird(K, K^c) = |E| −a(K) +c(K)

a(K)−c(K) . (3.11) Both the density-based metrics and the cut-based metrics are deﬁned to evaluate a single cluster. These metrics can also be used to evaluate the performance of local graph clustering algorithms. For global graph clustering problems, the value of each cluster is ﬁrst calculated, then the overall performance is evaluated by the statistics of all clusters, for example, the mean or the extreme value of all clusters.

3.3 Graph clustering methods

Various methods have been proposed for graph clustering problems.

In this section, we brieﬂy review some of the commonly used methods.

In the next section, we will give details of random walk-based methods, which is not reviewed in this section. Note that even methods that fall into diﬀerent categories, they are still closely linked with each other.

3.3.1 Spectral graph clustering

Spectral clustering is one of the most popular graph clustering algo-rithms [65, 106, 107]. It can be easily implemented and often gives satisfactory results. To derive the spectral graph clustering method, we start from the normalized cut measurement deﬁned by Eq. 3.10 as the ﬁtness function. We further deﬁne an assignment vector x, such that

x_i =

⎧⎨

⎩

|K|1 if i∈K

|K1^c| if i∈K^c

. (3.12)

It can be derived from Eq. 3.10 that

ncut(K, K^c) = x^TLx

x^TDx, (3.13)

where L is the Laplacian matrix as deﬁned in Eq. 2.2 and D is the degree matrix of the graph. Minimizing Eq. 3.13 with regard to x in the discrete manner is NP-hard [65]. However, we can relax the problem by letting x to be a real valued vector with the constraint that x^TD1 = 0, where 1 is the vector where all elements are 1s.

This problem can be solved as the generalized eigenvector problem Lx = λDx. Let y = D¹^/²x. It can be seen that y is the second eigenvector of the normalized Laplacian

L_sym =D⁻¹^/²LD⁻¹^/². (3.14) After the eigenvector is calculated, we can cluster the nodes according to their corresponding values in the eigenvector. This principle can easily be extended tokclusters. Instead of using only one eigenvector, multiple eigenvectors can also be used [106]. Algorithm 1 shows the details of the spectral graph clustering algorithm.

Normally, k-means clustering is used to cluster the nodes after the feature vectors are calculated. However, any data clustering algorithm can be used in this step [108, 109].

It should be noted that if the target is to minimize the ratio cut measurement, which is deﬁned as

rcut(K, K^c) = c(K)

|K| +c(K^c)

|K^c| . (3.15) Applying the same procedure as normalized cut, one ends up ﬁnding the eigenvector corresponding to the second smallest eigenvalue of the Laplacian matrixL of graph G(V, E)[110].

3.3. Graph clustering methods 31 givenadjacency matrix A of graph G(V, E)and the number of

clustersk

• Compute normalized Laplacian matrix L_sym by Eq. 3.14

• Compute the ﬁrstk eigenvectorsv₁, v₂,· · · , v_k of L_sym

• LetV be the matrix with the columnsv₁, v₂,· · · , v_k

• Normalize the rows of V to have norm of 1

• Using the rows ofV as features and cluster the data points into k clusters

• Assign the nodes to the corresponding cluster given by the data clustering algorithm

Algorithm 1: Spectral graph clustering using multiple eigenvec-tors

3.3.2 Fitness function optimization

A large number of graph clustering algorithms choose to optimize a predeﬁned ﬁtness function using diﬀerent optimization strategies.

Newman et al. [83, 111] studied the modularity of social networks and deﬁned the modularity as

e_ii−a²_i

, (3.16)

where i is the index of a community, e_ii is the fraction of the edges that connect the nodes in communityi, anda_i =

je_ij is the fraction of the expected number of edges that connect the nodes in community i. Spielman and Teng opted to take the conductance (Eq. 3.9) as the ﬁtness function [94]. Starting from the similarity and dissimilarity of two nodes, Veldt et al. studied the disagreement of the clusters and

deﬁned the ﬁtness function as

i,j∈E⁺

(1−λ)x_ij +

i,j∈E−

λ(1−x_ij), (3.17)

whereλis a predeﬁned weight,x_ij ∈ {0,1}is the indicator of whether nodes i and j are in the same cluster, E⁺ is the set of edges that the two end nodes are in the same cluster, and E⁻ = E\E⁺ is the complementary set ofE⁺[112]. In [113], Görke et al. evaluated graph clustering algorithms that use diﬀerent density measurements as their ﬁtness function using two greedy heuristics: vertex moving and cluster merging. They show that the vertex moving approach produces more reliable results than the cluster merging approach. They also show the limitations of using diﬀerent density based measurement as the ﬁtness function.

As a matter of fact, ﬁnding the optimal value of the ﬁtness function deﬁned earlier is NP-hard. However, diﬀerent optimization strategies have been applied to ﬁnd a locally optimal solution.

• Greedy Agglomeration [82, 83, 113, 114]

This method starts oﬀ with the state that each node is in its own cluster. At each iteration, the algorithm evaluates the im-provement of the ﬁtness function by merging each pair of the clusters and merges the pair with the largest improvement. This procedure is repeated until no improvement can be achieved by merging any pair of clusters. This method adopts the greedy op-timization strategy and has a complexity ofO((m+n)n), where m is the number of edges and n is the number of nodes. Obvi-ously, this approach does not guarantee global optimization. A clear limitation of this approach is that once a node is assigned to a cluster, it will not be moved to another one. To address this limitation, in [115], Blondel et al. alternatively apply two

3.3. Graph clustering methods 33 phases. In the ﬁrst phase, nodes are moved between commu-nities to optimize the modularity. In the second phase, a new graph is constructed by merging the communities in order to improve the ﬁtness function.

• Relaxation

Optimizing the ﬁtness functions such as those deﬁned in Eqs.

3.9 3.10 3.15 3.16, in a discrete manner is a NP-hard problem [116]. However, these problems can be relaxed by converting hard assignment into fuzzy assignment, similar to the strategy used in spectral clustering algorithms [65, 106, 107]. In [112], the authors studied the correlation clustering problem, whose objective is to minimize Eq. 3.17. This optimization problem is an integer linear programming problem and it can be relaxed to a linear programming problem and the solution can be found in polynomial time [117].

• Other optimization techniques

Given a ﬁtness function, an optimization technique can be ap-plied to ﬁnd the optimal or a suboptimal solution. In [118, 119], simulated annealing is used to optimize Eq. 3.16. Duch and Arenas adopted extremal optimization approach in optimizing the modularity ﬁtness function [120]. In their approach, nodes are ﬁrst ranked according to their contribution to the ﬁtness function before moving to other clusters. Then nodes are moved based on the probability P(q) ∼q^τ, where q is the rank of the node andτ is the extremal optimization constant. In [121], mean ﬁeld annealing is used to optimize Eq. 3.16.

3.3.3 Data clustering-based methods

Data clustering is an important topic in machine learning and has been comprehensively studied for many decades. Many graph clustering

algorithms take advantage of the results achieved in data clustering.

These algorithms ﬁrst extract features for the nodes in the graph, and then apply a suitable data clustering method, such ask-means, to ﬁnd clusters of nodes. The most critical part of these methods is to ﬁnd suitable feature representations. Zhang et al. generate feature vectors for the nodes using the concept of limited random walks [P1]. In [122], Tian et al. learn node representations using a sparse deep autoencoder and take the activations of the last hidden layer as the features.

3.3.4 Model-based methods

Model-based methods assume that the graph is generated from a math-ematical model and ﬁnd clusters by optimizing a ﬁtness function de-rived from the model. Chen et.al model a graph using a stochastic block model by which a random graph is generated with a higher probability to link in-cluster nodes than between-cluster nodes [123].

With the relaxation of the cluster matrix—a matrix that indicates whether the two nodes are assigned to the same cluster, the graph clustering problem is converted to a convex optimization problem and can be solved eﬀectively. The Potts model [124] studies interacting spins positioned on a lattice structure. Each spin can be in one of the q states, whereq is a predeﬁned integer. Note that the Potts model is a generalized Ising model [125], in which spin can be in one of the two states. An energy function (Hamiltonian) is deﬁned for the system.

In [126, 127], q-state The Potts model is used to detect communities in a graph by minimizing the Hamiltonian function.

In document Graph Analysis and Applications in Clustering and Content-based Image Retrieval (sivua 49-54)