• Ei tuloksia

S EARCH -B ASED S OFTWARE D ESIGN

Search-based software engineering has applied meta-heuristic search algorithms to software engineering problems since the ‘70s [Clarke et al., 2003]. Problems in the area of testing, especially in automated generation of test cases, have been most studied [Harman et al., 2009]. However, the area most relevant to this thesis is the field of search-based software design (SBSD). In SBSD, search algorithms are used to tackle problems related to different stages of software design: assigning methods to classes, designing the architecture with patterns or styles, QoS-optimization on service-oriented architectures, and re-design. While subjects more related to re-design (i.e., maintenance or re-engineering), such as clustering (e.g., [Di Penta et al., 2005; Doval et al., 1999; Harman and Tratt, 2007; Seng et al., 2005]) and refactoring (e.g., [Du Bois and Mens, 2003; O’Keeffe and Ó Cinnéide, 2006, 2008; Seng et al., 2006; Quaum and Heckel, 2009]), have been studied for a couple of decades, now also problems regarding direct software design, such as solving the class responsibility assignment (CRA) problem and applying design patterns, have attracted more interest. The most prominent studies in SBSD published (and electronically available)

47 by December 2009 are described in detail in publication [VI]. More recent studies are briefly discussed here.

2.3.1 Class design

Simons et al. [2010] study using evolutionary, multi-objective search and software agents to aid the software architect in class design. Use cases are used as a starting point. Actions (verbs) in the use cases are transformed into methods, and data (nouns) is transformed to attributes. The methods and attributes are then grouped into classes. Each class should have at least one method and one attribute, and no method or attribute can be in more than one class. One individual (solution) is thus the design containing all methods and attributes (and their class distribution). For mutation, a set of methods and/or attributes is selected and relocated to a different class. Crossover is applied so that the attributes and methods of two classes are swapped. Coupling and cohesion are used to calculate fitness.

The evolutionary search is performed by a software agent. Simons et al.

suggest that a global multi-objective search is unnecessary, and the search should be narrowed towards the “most useful and interesting candidate designs”. They attempt to achieve this by isolating discrete zones from the search space, and then using a local search within these zones. Local search is conducted using a single-objective genetic algorithm, which only considers coupling in the fitness calculations. The designer is then presented with the results of these local searches. The designer can manually specify diversity thresholds to control when a zone is isolated, or an agent can do it automatically.

Bowman et al. [2010] have applied a multi-objective genetic algorithm (MOGA) to assist in the CRA problem. The CRA problem is similar to the class design problem studied by Simons et al., as it attempts to solve the assignment of responsibilities (methods) into classes and the interaction between classes. Bowman et al. also use coupling and cohesion to measure the fitness of their solution, and aim rather at providing interactive feedback to a designer than at producing a whole design. Their MOGA takes a class diagram as input, as well as user-defined constraints on what can and cannot change in the class diagram. The class diagram is then evaluated, and possible improvements are suggested. The MOGA ultimately provides alternative solutions to the user.

2.3.2 Low-level architecture transformations

Jensen and Cheng [2010] present an approach based on genetic programming (GP) for generating refactoring strategies that introduce

48

design patterns. The authors have implemented a tool, REMODEL, which takes as input a UML class diagram representing the system under design.

The system is refactored by applying “mini-transformations”: abstraction, abstract access, partial abstraction, delegation, encapsulating construction, and wrapping. The encoding is made in tree form (suitable for GP), where each node is a transformation. A sequence of mini-transformations can produce a design pattern. A subset of the patterns specified by Gamma et al. [1995] is used: Abstract factory, Adapter, Bridge, Decorator, Prototype and Proxy. Mutations are applied by simply changing one node (transformation), and crossover is applied as exchanging sub-trees.

The QMOOD [Bansiya and Davis, 2002] metrics suite is used for fitness calculations. In addition to the QMOOD metrics, the authors also give a penalty based on the number of used mini-transformations and reward the existence of (any) design patterns. The output consists of a refactored software design as well as the set of steps to transform the original design into the refactored design. This way the refactoring can be done either automatically or manually; this decision is left for the software engineer.

2.3.3 High-level architectural transformations

Praditwong et al. [2011] introduce a multi-objective approach for automated software module clustering, as well as two formulations: the equal-size cluster approach and the maximizing cluster approach. The equal-size cluster approach favors clusters that have on average the same number of modules while the maximizing cluster approach favors a minimal amount of clusters with only one module. A two-archive Pareto optimal GA is mainly used, but a single objective hill climbing algorithm is also implemented for comparison. Their primary findings suggest that in order to have solutions with high cohesion and low coupling, the equal-size cluster approach to the multi-objective problem produces the best results overall. This study is an example of the several studies in clustering, which ultimately work more on the refactoring area.

Martens et al. [2010] present an approach which attempts to automatically improve a given architecture model with respect to performance, reliability, and cost. The approach is best suited for component-based software architectures. Martens et al. they attempt to optimize four degrees of freedom: processor speed, number of servers, component allocation and component selection. It is assumed that components with the same interface provide the same functionality, and thus no attention to the functionality of the system is needed. The approach is implemented in the PerOpteryx tool. The tool requires as input a component-based architecture model with performance, reliability and cost annotations. The tool then searches for Pareto optimal candidate solutions. When mutating,

49 one or several design options (degrees of freedom) are varied. In crossover some of each candidate’s design option values are taken and combined.

Solutions that violate quality requirements are eliminated during selection.

After elimination, only the Pareto optimal solutions are kept. The authors list a significant amount of limitations to their approach, such as questionable efficiency, no regard for uncertainties, limited degrees of freedom, simplistic cost model and limited genetic encoding (an array of choices).

Aleti et al. [2009] present the ArcheOpterix tool. It is an Eclipse plug-in that provides a platform to implement different architecture evaluation and optimization algorithms, and uses AADL as the architecture description language. So far, only deployment metrics (data transmission reliability and communication overhead) have been implemented for evaluation, and the authors present an example of optimizing a deployment architecture. ArcheOpterix uses Pareto optimality as a primary method for finding best solutions, but a single-objective fitness function is also implemented. When a near Pareto front has been found, ArcheOpterix draws the near Pareto front line, which contains all the non-dominated solutions found by the algorithm.

2.3.4 Comparison to presented work

The studies by Simons et al. [2010] and Bowman et al. [2010] are close to the approach presented here, as in the case of Simons et al., the direction of design is clearly upstream, and with Bowman et al. there is also potential for upstream design. However, both stay at class level, and do not consider how the actual classes interact. Also, both only use coupling and cohesion as metrics, which is natural when dealing with “simple”

class structure only, but insufficient if, e.g., interfaces are considered, as their value is not apparent if the fitness relies solely on these simple metrics.

While Bowman et al. [2010] and Simons et al. [2010] study lower level design than what is discussed in this thesis, the studies by Praditwong et al., [2011], Martens et al. [2010] and Aleti et al. [2010] operate on a significantly higher level. These studies assume that the class level design is complete and concentrate on components. Martens et al. [2010] even assume that they assume that interchangeable components produce similar functionality, which is a fair assumption given that they attempt to optimize the architecture in terms of performance, reliability and cost, and the main methods of doing this is reallocating components to servers and considering the number of servers used. However, this assumption makes it clear that the designs of the actual components are not under consideration. As for the study by Aleti et al. [2010], they have

50

implemented a tool which should be capable of using different evolutionary approaches, fitness functions and architectures, but only provide an example for a deployment architecture. So, at this point it seems that they are not interested in the class level design of the architecture. Finally, the study by Praditwong et al. [2011] is an example of the many studies made in software clustering. It differs from the approach presented in this thesis both on the direction of the design and on level of detail, as studies in clustering rarely consider what the modules contain and how the clusters are ultimately connected, but are only interested whether they are connected or not.

The approach by Jensen and Cheng [2010] is the most similar to the one presented in this thesis. However, they clearly have a refactoring oriented approach instead of pure upstream design. They also construct design patterns piece by piece through mini-transformations, instead of applying whole patterns. The existence of patterns, no matter what kind, is, however, greatly rewarded. Obviously, there is a risk that a large number of incomplete patterns is left in the architecture, and that the patterns are not applied in the best places.

To summarize, the more recent approaches give a fairly good view of the overall status of search-based software engineering. Upstream approaches are still scarce, and the present studies only change the design in very small steps. However, when a good architecture “simply” needs to be optimized, there are many studies on different levels on how to do that automatically. Using design patterns is one of the most recent trends, but no other study applies them to upstream software architecture synthesis in a manner similar to the one used in this thesis.

51

3 Genetic Software

Architecture Synthesis

In this chapter the actual method for genetic software architecture synthesis is described, and implementing the synthesizer is discussed.

Experimental results with this approach have been presented in publications [I], [II] and [VIII]. Two case studies were used in the experiments: the ehome, and a robot war game simulator, called hereafter robo. The approach is, however, applicable for any system in principle. I will here describe the method on a general level, and use ehome as a running example to demonstrate how the synthesis process works in practice. Results from case studies are briefly presented. Also, I will briefly present the Darwin tool, which provides a user interface for software architecture synthesis with the presented algorithm. The tool support is discussed in more detail in publication [V].