• Ei tuloksia

PART III: DEVELOPMENT AND EVALUATION OF MODEL IDENTIFICATION

8.2   S PATIAL AND G RAPH R EPRESENTATIONS OF N ETWORK N ODES

The statistical significance value of a dependency measure, generally denoted here for two node variables and by , , is interpreted as a similarity value and transformed into a dissimi-larity value , as , 1 , . Dissimilarity values for all 1 /2 node pairs are given as a symmetric, size  , dissimilarity matrix. In principle, these dissimilarities can be perfectly “visually” represented in an 1 dimensional space as a set of nodes, where the Euclidean distances between the nodes correspond to their dissimilarities. However, by ap-plying the multidimensional scaling methods in Chapter 5 to the dissimilarity matrix, we can re-duce this “visual” representation to a -dimensional approximation ( 1), where the true dissimilarity , is represented as closely as the dimensionality allows by a continuous-valued distance , between the pairs of nodes.

To obtain a topology, or a graph presentation, of the nodes, the node dissimilarities represented by a spatial node location map must be turned into binary relations, a set of neighbourhood sets.

Here we adopt a uniform thresholding procedure with the distance threshold defined for

SR , , SR ,

Γ 1 2⁄

2 πΓ 1 2⁄ 1

| |

. (8.3)

KT , , 2 exp 1

2

| , |

, (8.4)

Figure 8.1. Thresholding of a node location map into a graph structure. The construction of a neighbourhood for node is demonstrated when the threshold distance is defined by . Nodes within the threshold distance from are neighbours to .

thr

pairs of nodes on the node location map. The nodes within the threshold distance from one an-other are considered neighbours; i.e., if , for nodes and , then and

. An example of this procedure is shown in Figure 8.1, where the neighbours of node are those within the distance from , and denoted here by undirected links. Although with a given threshold distance the graph structure is unique for a location map, an infinite number of location maps may correspond to a single graph structure, because small changes in node dis-tances do not necessarily affect the thresholding and thus the neighbourhood relations. For con-venience, this graph construction is abbreviated below as the MGMN (MDS-based Graph esti-mation for Markov Networks) method (sketch of it shown as Algorithm 8.1).

8.2.1 Uncertainty and the Effect of the Threshold Distance

The uncertainty of an estimated node location map and its corresponding graph structure is re-lated mainly to the size of the data set, from which node similarities are estimated. Because the node location map is a -dimensional approximation of the original dissimilarities, this approxi-mation also introduces some further uncertainties into graph estiapproxi-mation. Furthermore, if nodes and are neighbours on a graph, the uncertainty related to this neighbour relation can be seen as depending on the gap between the threshold distance and the distance , ; i.e., the smaller the value , on the node location map, the more uncertain the corre-sponding graph link. In addition, the larger the being used, the more uncertain the neighbour relations included. With a small , only the most certain neighbour relations are included, though many true neighbours may be excluded.

Because is important in graph estimation, it may, when chosen incorrectly, later in the mod-elling phase drastically affect the qualitative properties of an MRF model. If is chosen far too large, the graph is too strongly connected, resulting in false coherent behaviour in the model.

If chosen far too small, the graph is too loosely connected, and truly coherent behaviour may be lost in the model. However, choosing slightly incorrectly is not expected to incur these dras-tic effects; on the contrary, to affect qualitative model behaviour in such a fashion, must be

Algorithm 8.1. Sketch of the MGMN method.

1. Initialise   as the set of all node variables:  ; 1, … , :  

2. Initialise the adjacency matrix   as an unconnected graph:  ,  :  ,   0  3. Estimate  ,  for all node pairs  , ;  ,  

4. Calculate  ,  for all node pairs  , ;  ,  as  , 1 ,  

5. Form an   dissimilarity matrix   so that  , ,  

6. Apply MDS to   to obtain a low‐dimensional spatial representation with node locations    for all nodes  ; 1, … , : , and with internode distances  ,  for all node 

pairs  , ;  ,  

chosen quite poorly. Some approximate information, e.g., the number of neighbours nodes should typically have, is usually available for choosing an appropriate .

If no information is available for choosing , we may try various threshold values and com-pare the model identification results with the varying threshold values to each other. Yet again, usually no reference is available, to which the resulting graph could directly be compared. How-ever, predictions with the MRF model with the obtained graph structure and parameters can be evaluated by comparing the model predictions to data. Uncertainties in estimated model parame-ters can also be studied, and such a threshold distance chosen as to minimise these uncertainties.

Model predictions and uncertainties are discussed further in Chapter 9 in connection with MRF model parameter identification.

8.2.2 Advantages and Limitations of Graph Construction

A data-based graph estimation scheme is valuable when the true system topology is not known on the basis of domain knowledge, when several pieces of inconsistent topology information exist, or when some other uncertainty about the system topology exists. When no information at all is available about the system topology, the data-estimated graph structure is the best estimate available for such topology. In case of some prior information about a mostly unknown system topology exist, a graph estimate can be applied by complementing it with the prior knowledge by simply adding or removing links. If the system topology is known but partly uncertain, a graph estimate can be used to evaluate the prior topology information and to complement the uncertain parts.

Some networked systems, such as MTNs, may contain several pieces of topology information.

MTNs embrace both physical and logical topologies, as discussed in Chapter 2. BTSs and their cells (nodes) interact mostly through logical connections, but also their physical topology is im-portant, because, e.g., the handover between two cells depends on their physical locations and thus affects the states of the respective nodes. Because information from several topologies may be partly inconsistent and partly overlapping, it may be difficult to obtain a single consistent pology to MRF modelling. We could simply choose one of many topologies, but important to-pology information would then be wasted. However, as the effect of all topologies should be seen in node state data, at least on the scale they matter, the data-estimated graph structure should manifest the joint impact of all these topologies. In addition, the weightings of the to-pologies in the estimated graph structure should reflect the impact of each topology on the op-eration of the system. Consequently, we propose that the graph estimate be seen as the best sin-gle topology estimate for the MRF modelling purposes.

In two obvious and extreme situations the MGMN method does not work. The first arises when the nodes are completely independent and all dissimilarities equal one. All nodes should be equi-distant apart from each other, corresponding to the distance of total dissimilarity. This corre-sponds to a graph with no links. The second, an opposite situation, obtains when all nodes are highly coherent and dissimilarities equal zero, a situation corresponding to a fully connected graph with links between every node.