• Ei tuloksia

Network analysis gives an overall view of network dynamics considering the characteristics of all participants in the network. For individual-level analysis, node-level measures are studied in section 4.4.1, and based upon their nature they are selected for identification influence maximizer in section 4.4.2. To classify the individuals according to their influencing power, the clustering technique and the statistical analysis used to compare the clusters are explained in section 4.4.3.

4.4.1 Node Level Measures

Node level centrality measures are calculated to get an idea about the influence power so that it can be used for the identification of influential nodes. The below measures are calculated specifically in addition to normally used centrality measures including degree centrality (Hansen et al., 2011).

K-core Decomposition

K-core decomposition is a technique that divides the network into layers and provides a hierarchical structure of the network-based upon cores. K-core is the subgraph in a network having all vertices of degree k. The coreness value of a vertex is k if it is in the k-core but not in k+1 (Alvarez-Hamelin et al., 2005). The coreness of each student in the network is showing its importance according to the value that will help to find the influence maximizers in a computationally efficient way (Malliaros et al., 2016).

37 Max Neighborhood Component (MNC)

The maximum neighborhood component of a vertex v gives the number of nodes in the connected subgraph (Lin et al., 2008). If the values of MNC are high, then it means a student is connected with a large number of neighboring students and it also shows the worthiness of the spread of information by that student (Rossi et al., 2018).

Closeness Centrality

Closeness centrality is measured using the reciprocal of the sum of the length of the shortest path between a student and all other students in the network (Hansen et al., 2011; Geum & Kim, 2020).

If a student has the highest closeness centrality then it means that it is very close and in the approach of all other students. It also shows the influential importance of students within the network.

Betweenness Centrality

Betweenness centrality measures the extent to which a student is lying between the students in the shortest path and causing the flow of information from one student to another (Hansen et al., 2011; Geum & Kim, 2020). A student having high betweenness centrality is having more influential power as it is causing the transfer of information from one student to another.

Eigen Centrality

Eigen centrality mainly calculates the relative influence score. The main idea of this measure is to focus on quality instead of quantity. It means a student having connections with other students having a high score will have a high score as compared to the other one having the same number of connections with other students having less score (Hansen et al., 2011; Geum & Kim, 2020).

It means this measure also considers the influence of neighbors. If the neighbors are more important then the student will also have high importance and vice versa.

Cross Clique Centrality

A clique is a subgraph where every two different vertices are adjacent to each other. Cross Clique Centrality shows the number of cliques to which this vertex belongs to. So, students having a

38

high cross-clique centrality are part of many cliques and they can cause diffusion of information in the network. (Faghani et al., 2013; Saqr & Viberg, 2020)

Diffusion Degree

The diffusion degree shows the cumulative diffusion power of a student and its neighbors (Kundu et al., 2011, Banerjee et al., 2013). A student having a high diffusion degree will have more influence and diffusion power in the network.

Collective Influence

It shows the product of reduced degree and summation of the reduced degree of all nodes at distance d (by default 3). It is used by (Morone et al., 2016) to find the influential nodes in the past. Students having more collective influence means that connected students are further connected with many students resulting spread of information in the network.

Clustering Coefficient

The clustering coefficient gives the tendency of a student for the clusters. If a student is connected to a friend and friend of a friend then it is easier to spread the information and the clustering coefficient will be high in this case (Liebig & Rao, 2014). In this way, that student will become significant concerning information spreading capability.

Gravity Centrality

Gravity centrality is based upon Newton's gravitational law and it is equal to the k-shell of nodes u and v divided by distance between them (Simsek et al., 2020). If a student has high gravity centrality it means it has more force of gravity or interaction with other nodes and has more influential power to spread the information in the network.

4.4.2 Identification of Influence Maximizers

To find the participants having greater influential power and control over the network the best available technique is a clustering of nodes based upon the centrality measures that have been calculated. There is no labeled data so in this scenario clustering techniques can help to differentiate between the users.

39

Kmeans clustering technique is used for this purpose as it is one of the most popular clustering techniques at the moment. K means technique divides the nodes into different clusters based upon the centrality measures. nodes having similar influential power are clustered together which is the objective of making the clusters.

There are some prerequisites before moving to the k-means algorithm. Data is normalized between 0 and 1 using the below function below mentioned function

normalize function(val) (val- min(val))/(max(val) - min(val))

The optimum number of clusters are also found for all networks using elbow, silhouette, and NbClust methods that are very common. The mean, median, and SD statistics of the cluster are also found to analyze the quality of clusters.

Finally, cluster comparison is performed using the ANOVA technique to verify the validity of clusters. The reason to choose ANOVA is that if there are more than two samples under comparison the Test is not able to do that but if there are two samples then ANOVA and T-Test behave the same. For the given values of centrality measures, the null hypothesis is that the median of all clusters is the same and the alternative hypothesis says that the median is different.

For the validity of clusters, they must have different medians that why they are different clusters otherwise there should be only one cluster. In this way, it can also be highlighted which measures are playing major roles in the formation of clusters.

H0: The median of centrality measure is equal H1: The median of centrality measure is not equal

Tukey HSD test is also used along with ANOVA to perform a pairwise if there are more than two clusters of students in a network.

40

5 RESULTS

The results of a research study provide an overview of the outcomes of the whole work. Results reveal imperative facts and help to comment on whether the hypothesis is acceptable or not.

This section is about the outcomes of the whole study. Section 5.1 is describing all networks based upon network measures including rich clubs in the networks. It also shows the comparison of network measures using Null models that also highlight the lower and upper bound of the network measures. Section 5.2 highlights the generative factors behind the formation of networks. Finally, Influential students are highlighted with the help of clusters in section 5.3 and ANOVA analysis results are shown having P- values that help to evaluate and compare the clusters.