The Quantifiable Performance Difference of the Implementation

Since this research is based on the existing work, comparing the difference in performance between the versions is simple to justify. The evaluation of performance is not only rel-evant to justify the transformation done, but it is also beneficial in identifying the causes that slow down simulations. The standpoint for the development to integrate the Simulink CFB-model with APROS means that most changes to the CFB-model design were made to improve usability or to enable integration (not to optimize). The comparison for quantitative performance is also reviewed to evaluating the run-time performance of the newly developed VTT CFB-model (Figure 1 - step 4).

The quantitative test-case used to differentiate the performance between the simulations was to run the simulation forward for 200 seconds with predefined tolerances, input, and initial state. The measurement for the performance difference was conducted using a computer with model configuration for stand-alone Simulink run and the server-client approach for the service driven integration case for APROS. The static analysis conducted on the models showed that the initial model has 935 individual components and the new version has 417 (Table 12). For other technical details, the key details to identify the two models are the number of states, input, output, and sample times (Table 12).

On closer look (Table 12), the only variable that has stayed the same is the number of states.

However, the general structure all-around has been re-engineered to satisfy the integration criteria. For instance, the input and output structures of the model were changed from the signal based on the hierarchical design, allowing the single output to contain the 56 output signals previously emitted. The single input now shares the similar characteristic containing

Table 12. Simulink model differences

Component Count: Benchmark Count: Service

Number of continuous states 1821 1821

Number of inputs 0 1

Number of outputs 56 1

Number of sample times 4 3

Total blocks 935 417

Constants 268 251

MATLABFunction 1 2

S-Function 2 1

roughly five times the output array (Table 12). As such the setup resembles the traditional abstract design for the system (Figure 6)¹⁰.

The performance estimation to conducting the test is based on utilizing the CPU time as it is available metric shared by the C-language and Simulink. The test regarding the difference in performance yielded the following results:

• Total time: 200 s,

• Clock precision: 0.00000005 s,

• Clock Speed: 2195 MHz, and

• Total performance difference: 4,3267621.

There is a fundamental difference at execution time between the two models. The results for the total time are derived by using the C-language-native time.h library. To isolate the key differences between the benchmark and the service implementation, the Simulink profiler was used. Based on the breakdown (Figure 23), the main process to simulate can be divided between the autonomous solver and the simulated source model (Figure 14).

The use of the CPU time (Figure 23) is divided to nine categories:

10. The main reason for using the hierarchical input and output is an integration specific problem - the use of hierarchical signal-design meant that a single pointer was needed to access the entire input or output.

Figure 23. Simulation performance difference - breakdown to components

• Total time to process the model,

• Total simulation time - model and solver,

• Solver phase,

• Integration - Solver specific,

• Jacobian transformation - solver specific,

• ODE 15s - solver specific,

• Derivation - model specific functions,

• Output - model specific function, and

• Simulation start | initiation | terminate - run-time preparation.

Out of the nine categories: the total time to process the model, the total simulation time, runtime preparation, and the Solver phase are aggregates. The nine categories can also be further classified to two baskets: solver and model. The solver specific class is autonomous.

This means that the resources its instances consume are the effect of something else. The role for the model specific class instances, on the hand, are the cause to generate the traffic for the solver specific instances. With this constraint, it is possible to pinpoint the main cause to explain the performance difference between the two versions of the same simulation. If compared to the process of producing the output, the resources dedicated to solving the

state, are many times bigger for the benchmark than with the service implementation. The second graph (Figure 24) shows how the solver-phase is actually burdens the simulation significantly, if compared to the service implementation.

Figure 24. Simulation performance difference - breakdown to components

The routine dedicated to solving the derivative function is the main cause to explain the dif-ference in performance between the two models. Since the same solver settings were used, the solver specific resources only have an added cumulative affect to the performance differ-ence, which is specific to the solver being overburdened. The solver instances to contribute are: the integration (solver routine), the Jacobian transformation (equation 2.1), the ODE 15s(solver type), and the solver phase. Together the components listed contribute the main difference between performance at models capable of producing the same amount of output states.

Besides providing the opportunity to measure the performance difference, the test also al-lowed to test the output difference of a given signal to compare between the two runs. The signals were recorded with a given input and initial state, the output with 0.2-second sample rate remained the same for 200 seconds (Figure 25). The output signals produced by APROS and Simulink by running the same model, are identical (Figure 25). The graph contains two signals measuring the same output at runs made for the benchmark and service (Figure 25.

Figure 25. Validation of Simulink-APROS output

However, running the integrated CFB-model as part of a larger simulation in APROS (Figure 26) is likely to come with its unique challenges. For the problems regarding the use of the inter-platform system, simply comparing the output is not enough. But isolating the random is an entirely separate process, one that is not covered by this research. However, techniques like the residual analysis can help to detect these types of issues.

The generalizable insight to how the service approach forces to more efficient style of pro-gramming originates from the type of main (stand alone) method that is required for the service to operate a simulation. In this research, the requirements for the simulation to meet for service implementation are divided into four categories: slave, tenant, master, and merged (Table 11). To meet a specific category, the simulation main routine must execute accordingly and have access to public input, output, time, and state variables.

By implementing the design principles for the executing routine forces cumulative change throughout the simulation source - promoting the distinction between global and local vari-ables, encapsulation, and hierarchical design principles. Because of these characteristics, the reusability of applications build according to the general service design principles is high

Figure 26. Output visualizations for CFB-model at APROS

(Balci, Arthur, and Ormsby 2011, Page 161, Figure 2). As a positive side-effect, the ap-proach to favor service design principles can also improve the quantitative performance of the underlying process. However, the improvement in performance is not directly correlated with the service paradigm. Further more, the relative increase in difficulty to develop using the service design paradigm is likely to cost more than the gain in performance alone.

In document Service platform implementation for simulation systems (sivua 83-88)