• Ei tuloksia

Fitness and selection

3.1 M ETHOD

3.1.4 Fitness and selection

At this point (mutations and crossover having been applied) the population is larger than in the beginning of the generation, as all the children produced by crossover have been added to the collection of chromosomes. The concept of natural selection now steps in, as the weakest individuals must be discarded from the population. This is done by first evaluating each individual and then performing selection.

As discussed in Subsection 2.1.3, evaluating software architecture quality is very difficult, and several viewpoints should always be considered. For

66

the GA the evaluation must be done using some kind of metrics, as the algorithm has to be able to straightforwardly order the architectures according to their quality. I have used three different quality attributes to evaluate the synthesized solutions: modifiability, efficiency (performance) and complexity. Modifiability is a natural choice, as the main purpose for which the design patterns are used is to increase the maintainability and modifiability of a system. Efficiency, in turn, is a counter attribute to modifiability, as many modifiability-enhancing modifications to architectures have a direct (negative) effect on efficiency. Thus, when examining the power of the algorithm, it is natural to concentrate on how much the algorithm manages to increase modifiability (from the zero state of null architecture). Having efficiency as a counter-weight makes sure that the algorithm is given some kind of restrictions, and can not wildly apply design patterns at every possible location. Complexity is used to ensure that modifiability has a contradicting quality attribute also in those cases where there is no clear efficiency drawback (e.g., in the case of applying a Template Method pattern).

Constructing the fitness function began by inspecting different software metrics and what properties they measure. The CK metrics (the ones suitable here, i.e., disregarding class hierarchies) were chosen as the base line for the fitness function. The CK metrics were chosen because applying them did not require any information of the semantics of operations, and clearly focused on class structure only. These metrics were then refined and extended so that they also consider the solutions that are not defined in the original metrics, such as the message dispatcher and server. While metrics were being defined, they were grouped according to whether they have a positive or negative impact on modifiability or efficiency.

Complexity is evaluated with a simple formula. Thus, the actual fitness is composed of five sub-functions: positive modifiability, negative modifiability, positive efficiency, negative efficiency and complexity.

Negative sub-functions are given coefficient -1, as they should always give values below zero. The total values for modifiability and efficiency are thus the sums of their respective sub-functions. Each sub-function is normalized so they have the same range. Each sub-function is also given a weight, so if one wants to emphasize one quality attribute over another, it can be assigned a larger weight.

The fitness function has constantly evolved during the research process.

However, after several iterations, the fitness function is defined as follows.

When wi is the weight for the respective sub-function sfi, the core fitness function fc(x) for solution x can be expressed as

fc(x) = w1sf1 – w2sf2 + w3 sf3 – w4 sf4– w5 sf5.

67 Here, sf1 measures positive modifiability, sf2 negative modifiability, sf3

positive efficiency, sf4 negative efficiency and finally sf5 measures complexity. The sub-functions are defined as follows (|X| denotes the cardinality of X):

sf1 = (|calls to interfaces| * ∑ (variabilities of operations called through interfaces)) + (|calls through dispatcher|)  ∑ (variabilities of operations called through dispatcher)) – |unused operations in interfaces|  β ,

sf2 = |calls between operations in different classes, that do not happen through a pattern|* ∑ (variabilities of called operations) + |calls between operations in same class|* ∑ (variabilities of called operations) * 2,

sf3 = ∑ (|operations dependent of each other within same class|  parameterSize) + ∑ (|usedOperations in same class|  parameterSize + |dependingOperations in same class|  parameterSize),

sf4 = ∑ ClassInstabilities + (2 * |dispatcherCalls| + |serverCalls|) 

∑( frequencies of operations called through dispatcher or server) +

|calls between operations in different classes|, sf5 = |classes| + |interfaces|.

The multiplier β (β > 1) in sf1 means that having unused operations in an interface is almost like breaking an architecture law, and should be more heavily penalized. It should also be noted that in sf1, most patterns also provide an interface. In sf3, “usedOperations in same class” means a set of operations in class C, which are all used by the same operation from class D. Similarly, “dependingOperations in same class” mean a set of operations in class K, which all use the same operations in class L.

ClassInstabilities measures the relation of calls between and within classes [Amoui et al., 2006].

As stated, the fitness function has evolved during the research process.

However, the core of the fitness function (division into five sub-functions) has stayed the same. Most alterations have been to coefficients and some minor calculations. One of the notable changes concerns the impact of the message dispatcher and its connections on positive modifiability (sf1). In the experiments discussed in publications [I-V], [VII] and [VIII], sf1

calculated the product of the (variabilities of) connections instead of the sum, as given above. This was changed in the final version of the fitness function, as in certain circumstances (discussed in Chapter 4), modifiability significantly overpowered the fitness value due to using the product of connections. Due to a typing error the fitness function formula

68

presented in publications [I-IV] suggests that the sum of (variabilities of) connections is used. In fact, the product of message dispatcher connections has been used in all other publications (with experiments) except publication [IX] and this thesis, where the latest version of the fitness function is used. This error, however, does not affect the presented results, which are based on complete (sub)fitness values, as the actual coded fitness function has been essentially the same for all publications [I-V], [VII] and [VIII].

It should be noted that all the patterns actually increase modifiability and decrease efficiency and complexity. The architecture is most efficient and simplest at null architecture stage, as there is minimal amount of classes and interfaces and connections between them. Also, no high-impact design choices, such as the message dispatcher, are present at this stage.

Thus, when the synthesis progresses and patterns are added, efficiency and complexity values for the architectures decrease while the modifiability value increases (vice versa, if patterns are removed, modifiability will decrease while the other quality values increase).

Figure 18. Example fitness graph for ehome

An example fitness curve for ehome achieved with the fitness function defined above is given in Figure 18. This is the average fitness curve of 20 runs, and displays how the average fitness of the ten best individuals (the elite) in each generation develops. The fitness curves for different quality attributes have been extracted to show what kind of balancing act it is to find architectures that have good quality from all viewpoints.

After each individual has been evaluated, they are ordered according to their fitness values. The very top of the population, the elite, is transferred

69 straight to the next generation. For the others a rank-based roulette wheel selection is made. Each individual is given a slice which corresponds in size to the rank the individual has in the population. A rank-based selection is more realistic, as the fitness function hardly gives absolute values in terms of how much “better” one individual is compared to another. After each spin of the wheel the selected individual is moved to the next generation and the sizes of slices are adjusted so that there are always as many slices in the wheel as there are individuals still left in the

“old” population. After 100 (as many as in the initial population) individuals have been selected, the rest are discarded.

The process of mutation, crossover, fitness evaluation and selection is repeated for a defined number of generation; in this thesis 250 generations is used as the default length of an evolution. After the evolution is finished, the best solution of the last generation is given as a class diagram.