• Ei tuloksia

3. Graph Clustering and Graph-based Data Clustering

3.3 Graph clustering methods

ird(K, Kc) = |E| −a(K) +c(K)

a(K)−c(K) . (3.11) Both the density-based metrics and the cut-based metrics are defined to evaluate a single cluster. These metrics can also be used to evaluate the performance of local graph clustering algorithms. For global graph clustering problems, the value of each cluster is first calculated, then the overall performance is evaluated by the statistics of all clusters, for example, the mean or the extreme value of all clusters.

3.3 Graph clustering methods

Various methods have been proposed for graph clustering problems.

In this section, we briefly review some of the commonly used methods.

In the next section, we will give details of random walk-based methods, which is not reviewed in this section. Note that even methods that fall into different categories, they are still closely linked with each other.

3.3.1 Spectral graph clustering

Spectral clustering is one of the most popular graph clustering algo-rithms [65, 106, 107]. It can be easily implemented and often gives satisfactory results. To derive the spectral graph clustering method, we start from the normalized cut measurement defined by Eq. 3.10 as the fitness function. We further define an assignment vector x, such that

xi =

⎧⎨

|K|1 if i∈K

|K1c| if i∈Kc

. (3.12)

It can be derived from Eq. 3.10 that

ncut(K, Kc) = xTLx

xTDx, (3.13)

where L is the Laplacian matrix as defined in Eq. 2.2 and D is the degree matrix of the graph. Minimizing Eq. 3.13 with regard to x in the discrete manner is NP-hard [65]. However, we can relax the problem by letting x to be a real valued vector with the constraint that xTD1 = 0, where 1 is the vector where all elements are 1s.

This problem can be solved as the generalized eigenvector problem Lx = λDx. Let y = D1/2x. It can be seen that y is the second eigenvector of the normalized Laplacian

Lsym =D1/2LD1/2. (3.14) After the eigenvector is calculated, we can cluster the nodes according to their corresponding values in the eigenvector. This principle can easily be extended tokclusters. Instead of using only one eigenvector, multiple eigenvectors can also be used [106]. Algorithm 1 shows the details of the spectral graph clustering algorithm.

Normally, k-means clustering is used to cluster the nodes after the feature vectors are calculated. However, any data clustering algorithm can be used in this step [108, 109].

It should be noted that if the target is to minimize the ratio cut measurement, which is defined as

rcut(K, Kc) = c(K)

|K| +c(Kc)

|Kc| . (3.15) Applying the same procedure as normalized cut, one ends up finding the eigenvector corresponding to the second smallest eigenvalue of the Laplacian matrixL of graph G(V, E)[110].

3.3. Graph clustering methods 31 givenadjacency matrix A of graph G(V, E)and the number of

clustersk

Compute normalized Laplacian matrix Lsym by Eq. 3.14

Compute the firstk eigenvectorsv1, v2,· · · , vk of Lsym

LetV be the matrix with the columnsv1, v2,· · · , vk

Normalize the rows of V to have norm of 1

Using the rows ofV as features and cluster the data points into k clusters

Assign the nodes to the corresponding cluster given by the data clustering algorithm

Algorithm 1: Spectral graph clustering using multiple eigenvec-tors

3.3.2 Fitness function optimization

A large number of graph clustering algorithms choose to optimize a predefined fitness function using different optimization strategies.

Newman et al. [83, 111] studied the modularity of social networks and defined the modularity as

Q=

i

eii−a2i

, (3.16)

where i is the index of a community, eii is the fraction of the edges that connect the nodes in communityi, andai =

jeij is the fraction of the expected number of edges that connect the nodes in community i. Spielman and Teng opted to take the conductance (Eq. 3.9) as the fitness function [94]. Starting from the similarity and dissimilarity of two nodes, Veldt et al. studied the disagreement of the clusters and

defined the fitness function as

i,j∈E+

(1−λ)xij +

i,j∈E−

λ(1−xij), (3.17)

whereλis a predefined weight,xij ∈ {0,1}is the indicator of whether nodes i and j are in the same cluster, E+ is the set of edges that the two end nodes are in the same cluster, and E = E\E+ is the complementary set ofE+[112]. In [113], Görke et al. evaluated graph clustering algorithms that use different density measurements as their fitness function using two greedy heuristics: vertex moving and cluster merging. They show that the vertex moving approach produces more reliable results than the cluster merging approach. They also show the limitations of using different density based measurement as the fitness function.

As a matter of fact, finding the optimal value of the fitness function defined earlier is NP-hard. However, different optimization strategies have been applied to find a locally optimal solution.

Greedy Agglomeration [82, 83, 113, 114]

This method starts off with the state that each node is in its own cluster. At each iteration, the algorithm evaluates the im-provement of the fitness function by merging each pair of the clusters and merges the pair with the largest improvement. This procedure is repeated until no improvement can be achieved by merging any pair of clusters. This method adopts the greedy op-timization strategy and has a complexity ofO((m+n)n), where m is the number of edges and n is the number of nodes. Obvi-ously, this approach does not guarantee global optimization. A clear limitation of this approach is that once a node is assigned to a cluster, it will not be moved to another one. To address this limitation, in [115], Blondel et al. alternatively apply two

3.3. Graph clustering methods 33 phases. In the first phase, nodes are moved between commu-nities to optimize the modularity. In the second phase, a new graph is constructed by merging the communities in order to improve the fitness function.

Relaxation

Optimizing the fitness functions such as those defined in Eqs.

3.9 3.10 3.15 3.16, in a discrete manner is a NP-hard problem [116]. However, these problems can be relaxed by converting hard assignment into fuzzy assignment, similar to the strategy used in spectral clustering algorithms [65, 106, 107]. In [112], the authors studied the correlation clustering problem, whose objective is to minimize Eq. 3.17. This optimization problem is an integer linear programming problem and it can be relaxed to a linear programming problem and the solution can be found in polynomial time [117].

Other optimization techniques

Given a fitness function, an optimization technique can be ap-plied to find the optimal or a suboptimal solution. In [118, 119], simulated annealing is used to optimize Eq. 3.16. Duch and Arenas adopted extremal optimization approach in optimizing the modularity fitness function [120]. In their approach, nodes are first ranked according to their contribution to the fitness function before moving to other clusters. Then nodes are moved based on the probability P(q) ∼qτ, where q is the rank of the node andτ is the extremal optimization constant. In [121], mean field annealing is used to optimize Eq. 3.16.

3.3.3 Data clustering-based methods

Data clustering is an important topic in machine learning and has been comprehensively studied for many decades. Many graph clustering

algorithms take advantage of the results achieved in data clustering.

These algorithms first extract features for the nodes in the graph, and then apply a suitable data clustering method, such ask-means, to find clusters of nodes. The most critical part of these methods is to find suitable feature representations. Zhang et al. generate feature vectors for the nodes using the concept of limited random walks [P1]. In [122], Tian et al. learn node representations using a sparse deep autoencoder and take the activations of the last hidden layer as the features.

3.3.4 Model-based methods

Model-based methods assume that the graph is generated from a math-ematical model and find clusters by optimizing a fitness function de-rived from the model. Chen et.al model a graph using a stochastic block model by which a random graph is generated with a higher probability to link in-cluster nodes than between-cluster nodes [123].

With the relaxation of the cluster matrix—a matrix that indicates whether the two nodes are assigned to the same cluster, the graph clustering problem is converted to a convex optimization problem and can be solved effectively. The Potts model [124] studies interacting spins positioned on a lattice structure. Each spin can be in one of the q states, whereq is a predefined integer. Note that the Potts model is a generalized Ising model [125], in which spin can be in one of the two states. An energy function (Hamiltonian) is defined for the system.

In [126, 127], q-state The Potts model is used to detect communities in a graph by minimizing the Hamiltonian function.