M ULTI - OBJECTIVITY - Genetic Algorithms in Software Architecture Synthesis

No matter how detailed the calculation of different quality attributes is, combining two attributes into one function is still a bit like summing apples and oranges. In other words, while the fitness value may be accurate, it is not very informative unless the distribution of the different quality attributes is known. A solution which receives 1000 points for efficiency and 0 for modifiability is quite different to one that receives 500 points for both properties – just like having 10 apples is quite a different case to having five apples and five oranges, even though in both cases the ultimate value, be it fitness or the number of fruits, is still the same. Thus, the most natural way to handle conflicting quality attributes, such as modifiability and efficiency, is to use multi-objectivity. This enables the evaluation of individuals from both viewpoints separately, and selection of individuals is made based on these individual values, not the combined sum value.

4.4.1 Method

Multi-objective software architecture synthesis is possible through application of Pareto optimality [Deb, 1999]. Pareto optimal selection

97 evaluates individuals from all viewpoints, and rather than providing one

“optimal” individual, a range of satisfactory individuals is presented.

If solutions are measured according to n properties, a solution x can be described by a vector x = [f1(x), f2(x), …, fn(x)], where fi(x) is the value of ith property in x. For convenience, it can be assumed that all properties should be maximized. When the properties represent conflicting quality criteria, it is unlikely to find a solution in the solution space S which would be optimal with respect to all the quality criteria. In such a situation, Pareto optimality provides a way to compare the solutions.

We say that a solution x’  S is Pareto optimal if for each x  S, we have either fi(x) = fi(x’), for all i = 1, …, n, or there is at least one property i such that fi(x) < fi(x’). That is, x’ is Pareto optimal if there exists no feasible solution x that increases some criterion without causing a simultaneous decrease in at least one other criterion. Typically, there is not a single solution that is Pareto optimal, but a set of Pareto optimal solutions. The Pareto optimal solutions of a set of feasible solutions are said to form a Pareto front.

In order to apply Pareto optimal selection, each quality attribute needs to be evaluated separately. In order to be able to view results in a two-dimensional graph, this study only concentrates on two quality properties, modifiability and efficiency. The fitness for each solution is thus given in the form f(x) = [sf1(x), sf2(x), sf3(x), sf4(x)] (using the notation for sub-fitnesses as given in Subsection 3.1.4). For evaluation, the complete values of both quality attributes are used. The complete modifiability value mf for solution x is defined as mf(x) = sf1(x) - sf2(x) and the complete efficiency value ef respectively as ef(x) = sf3(x)- sf4(x). Each individual thus has two fitness values, mf and ef, instead of just one fitness value where the two quality attributes are summed.

Once the fitness values have been calculated, the individuals are ranked based on modifiability (efficiency could be used as well). The Pareto front is then selected so that the most modifiable individual is naturally included, as it represents one of the extreme choices; this can be given the identifier pf1. The ranked individuals are then processed so that if the potential front individual pfn has an efficiency value ef(pfn) > ef(pfn-1), it is selected for the front (note, that mfn < mfn-1, as the individuals are ranked in descending order according to modifiability values).

As the population contains 100 individuals, the Pareto fronts obtained with the presented approach usually contain approximately five to ten individuals. Hence, when selecting a population, one front is not enough for the genetic algorithm to progress. Thus, in my approach, when the mutations and crossovers for population pn have been performed, the Pareto front of pn is selected for the next generation (population pn+1).

After this, the Pareto front of the remaining individuals, i.e., the Pareto front of population pn \ pn+1, is selected and transferred to the next generation. This iterative process is repeated until the population pn+1 has at least 100 individuals.

Once the evolution process is complete, the outcome is not just one class diagram, but the class diagrams (synthesized architectures) of the entire Pareto front of the final population. This enables the architect to view the different extremes without having to manipulate weights for the fitness function.

4.4.2 Case studies

The case studies were used to study two different aspects: 1) how do the Pareto fronts evolve during the evolution, and 2) what kind of solutions are produced in a front, especially how much does the most efficient solution differ from the most modifiable solution.

Figure 41. Initial Pareto fronts for ehome

Figure 42. Initial Pareto fronts for robo

Firstly, the Pareto fronts of both cases were studied. Figure 41 shows the Pareto fronts of 20 runs for ehome in the beginning of evolution (after 50 generations), and Figure 42 shows the respective Pareto fronts for robo.

Each Pareto front (different run) is represented by a distinctive marker. As can be seen, in the begininning the fronts are quite clustered in the efficient section of the graph, and in Figure 41 (ehome) no individual actually has a positive modifiability value at this point. In Figure 42 (robo) there are some individuals which have positive modifiability, but they are few, and the trend is the same as with ehome, as individuals are quite clustered and focused on the efficient section.

Figures 43 and 44 show how the Pareto fronts have evolved as the figures portray the final Pareto fronts at the end of evolution for ehome and robo, respectively. The Pareto fronts have moved towards the modifiable section of the graph, and the efficient solutions are still quite tightly clustered, while within the modifiable solutions there is much more variance. This is expected, as efficient solutions are close to the null architecture, i.e., have as few mutations as possible, while good modifiability values can be reached with a practically endless number of pattern combinations.

100

Figure 43. Final Pareto fronts for ehome

Figure 44. Final Pareto fronts for robo

101

Figure 45. Example solution for ehome from modifiable end of Pareto front

Secondly, the example solutions for both case studies where inspected.

The final Pareto front of one of the test runs was selected for exemplary purposes for both ehome and robo. The most modifiable solution from this exemplary Pareto front for ehome is illustrated in Figure 45, and the most efficient solution from the same Pareto front is given in Figure 46.

The difference between these two solutions is substantial. In the modifiable solution the message dispatcher style is central, and there are several instances of all low-level design patterns. On the other hand, in the efficient solution the message dispatcher is not present, the number of design patterns is much smaller. Also, in the efficient solution, most of the design patterns are instances of Template Method, which has the least effect on efficiency. This represents quite a typical front for ehome, although several fronts also had the message dispatcher in the most efficient solution as well, but it only had one or two connections, so it was very poorly used.

102

Figure 46. Example solution for ehome from efficient end of Pareto front

As for robo, the most modifiable solution of the exemplary Pareto front is portrayed in Figure 47, while the most efficient individual from that same front is given in Figure 48. The solutions are very similar to those achieved with ehome: in the modifiable solution the message dispatcher is heavily used, and there are several instances of all low-level design patterns. In the efficient solution there are no Strategy patterns, which give the highest penalty in terms of efficiency, and the message dispatcher is not used at all.

103

Figure 47. Example solution for robo from modifiable end of Pareto front

104

Figure 48. Example solution for robo from efficient end of Pareto front

To summarize, the multi-objective approach, implemented using the Pareto optimality concept, provides a way to evaluate each solution from different viewpoints. It also provides several candidate solutions instead of only one. This equips the designer with two kinds of information: 1) by examining the solution and fitness distributions it is easier to see how a chosen quality attribute affects the outcome of the design and 2), by seeing several solutions at once, using one solution as a starting point for the actual design can be done with more confidence, as opposed to using the one solution produced by a single weighted function, when it is not clear why the algorithm suggests that one particular solution.

105 Results from case studies show that the extremes of Pareto fronts are very different, and thus the entire front gives a complete view of what kinds of solutions the algorithm is able to produce. The multi-objective approach is discussed in publication [IX].

106

5 Evaluation

While the previous chapters presented the method for GA-based software architecture synthesis and its different variations, the results still need validation. In this chapter, I will begin by summarizing the results – how different variations affect the outcome and what are the differences between the case studies. I will then present evaluation for the multi-objective approach by examining the individuals of a Pareto front against ATAM-type scenarios. After this, an experimental study is presented to evaluate the results obtained by the (basic) GA approach by comparing the synthesized solutions with human-made solutions. Finally, the different factors affecting the outcome of case studies and the experimental study are discussed, as well as different aspects related to the synthesis that were discovered during the research process but not covered in the publications or case studies.

In document Genetic Algorithms in Software Architecture Synthesis (sivua 96-106)