• Ei tuloksia

CIf C subclass of B

5 Results and Analysis

This chapter presents the author’s work to answer the research questions. The current state analysis highlighted that the first research question could be answered if the prototype is based on the ontology, as the inherent feature of ontologies provides the common vocabulary between people and software agents. Since the ontology is based on the CIM Schema, the software agents are able to collect the CIM Schema information provided by the network equipment manufacturers and communicate seamlessly with the ontology.

Nevertheless, the second research question on maintainability must be examined in more detail. Therefore, the quality of the ontology was assessed in terms of compliance, modularity, reusability, and analysability to identify strengths and weaknesses using the corresponding sub-questions:

▪ Does the ontology fully meet the original requirements defined in the competency questions?

▪ Is the ontology intuitive and easy to describe and understand?

▪ Can components be modified without affecting other components?

▪ Can the ontology be easily extended beyond network devices?

▪ How easy it is to detect deficiencies, inconsistencies and components to be repaired?

The definition, planning, implementation and testing of the prototype is described in detail in the following sections. Although the development of the ontology is the iterative process, the scope of the thesis was limited to the first iteration and improvements were left for future projects.

5.1 Defining the Ontology Requirements

The ontology should be as simple as possible so that it is easy to maintain, but it must still meet the requirements and contain enough information to be useful. The SPARQL query language enables federated queries and therefore the CM data can be conveniently divided into several separate graph databases. The number of the CIs can be further reduced by using the collective CI pattern, as some CIs may be excluded from individual traceability. Based on the above, the requirements were defined by the domain experts in the internal workshop, where the scope of the ontology was determined using the CI diagram in Figure 21 and the competency questions in Table 6. The policy for naming the classes is derived from the CIM Schema, but the exact definition of classes may differ from that defined.

Figure 21. Initial CIs whose content is explained at a general level.

The managed element is the abstract class that provides the common superclass and corresponds to the SOI, while components identified by serial numbers are managed system elements. Examples of managed system elements are software, operating system, hardware, and components. The field replacement unit (FRU) is the manufacturer defined collection of physical elements to support, maintain, or update the managed element at the customer location. The product is the concrete class that combines physical elements and software identity and is acquired as a single entity. The software identity provides descriptive information about the software component for tracking resources and managing installation dependencies while physical elements are concrete controlled elements of the system that have physical manifestation. The Software Package Data Exchange (SPDX) [47] specification is the standard format for identifying components, licenses, and copyrights associated with the software package and software identifier (SWID) [48] is the standardized XML file that contains software product identification and management information.

Managed element

A superclass common to other classes.

The physical element that the customer maintains.

Table 6. The initial competency questions.

# Question Answer

1 Is software identity a managed system element? yes 2 Is physical element a managed system element? yes

3 Is software identity a physical element? no

4 Is location a managed system element? yes

5 Is product a managed element? yes

6 Is FRU a managed element? yes

7 Is managed system element a system element? yes

8 Is location a managed system element? yes

9 Is product managed element or managed system element? managed element 10 Is FRU managed element or managed system element? managed element

5.2 Planning and Implementing the Ontology

The prototype development process in Figure 22 is based on the test-driven ontology development process. The process includes defining user requirements in the form of competency questions, test-driven ontology development, testing the graph database, and answering the competency question.

Figure 22. The prototype development process.

The prototype was implemented using two frameworks, as shown in Figure 23. Protégé was utilized for the ontology development and Eclipse RDF4J [49] for graph database testing. Protégé is a set of tools for building domain models and knowledge-based applications with ontologies. Apache Maven [50] is a software project management tool based on the concept of a project object model and Eclipse RDF4J is the Java framework for handling and processing RDF data, including creation, parsing, scalable storage, inference, and query with RDF and linked data. Cellfie [51] is the Protégé extension for creating ontologies from spreadsheets, while OntoGraf [52] and VOWL [53] provide support for interactive navigation and visualization of ontology relationships. HermiT is the first publicly available OWL reasoner based on the hypertableau calculation [54].

RDF4J native store is the transactional graph database that uses direct disk IO, making Naming convention

it a more scalable solution than memory storage, with a smaller memory footprint, and providing better integrity and durability. Finally, GitHub [55] provided version control and source code management for the development project.

Figure 23.The architecture of the prototype.

The starting point for the development of the ontology was the definition of classes using the CIM Schema classes and competency questions. The class hierarchy was then developed from top to bottom assuming “is-a” relationship between classes as shown in Figure 24. It is important to note that all subclasses of the class inherit the properties of that class. Defining the primitive class hierarchy was trivial because it accurately reflects the competency questions and hierarchies of the CIM Schema classes. Nevertheless, the two issues required more reflection and were not entirely self-evident. Firstly, the Location class was moved down in the hierarchy because the ManagedSystem to which it was originally connected was too abstract level to be associated with the location information. Secondly, another cumbersome class was the Product because its information content overlaps somewhat with other information systems. Despite the fact

Individuals

that Product may be unnecessary, it remained in the ontology for the time being because the warranty date was considered important information and later the expiration date of the software license could be added to the class. It is noteworthy that owl:Thing is a concept that ranks at the top of all other classes and must, by definition, be at the top.

Figure 24. The class hierarchy.

The choice of domains and regions was a balancing measure, as the most generic classes should be the domain or the region, but in other hand they should not be too general to encapsulate data. Because there was no prior art for defining domains and regions, they were selected to be identical to the class hierarchy, as shown in Figure 25.

In practice, this means that, for example, the SoftwareIdentity domain describes the SWID range using the hasSWID property. Because only one property was defined for each individual, no restrictions needed to be defined, and as the result, for example, SoftwareIdentity may exist without SWID. The disjoints are inherited down the class tree so that, for example, the SWID cannot be the PhysicalElement because its superclass, SoftwareIdentity, has been disjointed from the PhysicalElement.

Figure 25. The hierarchy and the relationships between the classes.

The competency questions did not specify data property values, so they were adapted from the CIM Schema as shown in Figure 26. Some of the data properties considered unnecessary were excluded, because the information does not need to be included for foreseeable applications. The remaining data properties were assigned to the highest possible domain, for example, the Name has the domain ManagedElement and the range xsd:Name, because the Name is the common data property for all classes. This reduced the number of redundant data properties among the classes. The property range was selected from standard XML and RDF schemas abbreviated to xsd and rdf,

Figure 26. The domain and the range of the data properties.

Populating the ontology with individual’s data can be cumbersome if done manually.

Therefore, it is usually left to software agents which collect the data and upload it to the appropriate graph databases. The scope of the thesis was limited to the manual entry of individuals, and therefore they were imported from a spreadsheet using the Cellfie Protégé plugin. The first task is to create transformation rules as illustrated in Figure 27.

The domain-specific language defines mappings from spreadsheet, where Products and Locations are separated into different sheets. In particular, literal data types must be defined, as their absence would result in ambiguity in the data. Once all the conversion rules have been written, the axioms are generated by clicking the generate axioms button at the bottom. Cellfie automatically generates the axioms and displays a preview as shown in Figure 28.

Figure 27. The Cellfie plugin imports individuals to the ontology.

Figure 28. Cellfie generates the axioms automatically.

The core question of the ontology is how to choose the right class of individuals, for example, a Cisco 1841 router can be both the PhysicalElement and the ManagedSystemElement. The main difference between these classes is that the ManagedSystemElement always distinguishes individuals based on serial number.

The answer can be obtained from configuration risk analyses by identifying critical CIs that pose a high risk of SOI unavailability. The Cisco 1841 router shown in Figure 29 is critical to SOI availability because the primary function of router is to reliably transfer data and the failed router can slow or interrupt data traffic completely. In contrast, the hardware failures of the 4-port 10 / 100BASE_T_Ethernet switch cards shown in Figure 30 are relatively easy to diagnose, the cards are easily replaceable, and the failure affects the limited amount of communication. As a result, the collective CI pattern model can be applied to the interface card, defined as the PhysicalElement, and thus reduce the number of the CIs. Despite these decisions, it may change in the future because the ontology is expected to evolve iteratively over the life cycle of the SOI.

Figure 29. The example of ManagedSystemElement Cisco 1841_003.

Figure 30. The example of PhysicalElement 4-port 10/100BASE Ethernet switch.

The ontology as RDF/XML and Turtle files is available in version 2.0 [56], which is used in this thesis, under the MIT license.

5.3 Testing the Ontology

Incorporating testing into ontology development improves the quality of ontologies in terms of their completeness and aligns them with standardized ontologies and standards.

Furthermore, the testing process is useful for ontology engineers because it supports ontology verification and validation activities. Validation ensures that the ontology meets the requirements of the domain experts and other identified stakeholders, while verification assesses whether the ontology meets the specifications. The competency question summarizes the requirements of the domain experts and determines the scope and preliminary specification of the ontology.

The specifications obtained from the competency questions have been supplemented with the requirements set by the CIM Schema, which enables the compatibility of the ontology with software agents and other information systems. Test cases and axioms executed during the development of the ontology, as well as specifications formulated as unit test cases, verify that the implementation meets the specification. Because the prototype is only intended to demonstrate the feasibility of the chosen solution, the initial test cases are not exhaustive, and the coverage of the tests is low. Subsequent iterations, which are expected to improve the ontology, will also increase the number and improve the quality of the test cases.

The first step in the ontology validation is to load the ontology into the validation tool RDFUnit [57], which automatically creates and executes test cases. The validation activity is completed when the competency questions are answered by comparing the classes implemented in the ontology with the questions and the feedback of the domain experts has been analysed.

5.3.1 Verification

Once the classes have been defined, the first test cases can be written to resolve, and correct inconsistencies and incoherencies as shown in Figure 31. Entailed test cases are expected to be true whereas non-entailed are expected to be false. Both entailed and non-entailed test cases ensured that the competency questions are fully implemented.

OntoDebug identifies axioms responsible for inconsistencies and incoherencies in faulty ontology by applying interactive ontology debugging. This iterative process reduces the set of possible faulty axioms until the final set of faulty axioms is identified. OntoDebug also provides the repair interface to help repair faulty axioms by deleting or editing them.

Initially, the test cases were trivial, but as the ontology expanded, increasing number of errors occurred in the definition of domains, ranges, properties, and literals, and their identification became difficult without assistance of the test cases.

Figure 31. The example of the executed and passed test cases.

Once the ontology was consistent and coherent and all entities were defined, it was exported as the Turtle file and uploaded to the RDF4J NativeStore using the Rio parser for RDF, as listed in Appendix 1. The feature and schema unit tests written as SPARQL queries cover verifiable functional requirements as listed in Appendix 2 and Appendix 3.

The results of the JUnit [58] test run is shown in Figure 32. The Maven project including the unit test cases is available in version 2.0 [55], which is used in this thesis, under the MIT license.

Figure 32. The JUnit test run results.

Probably the most common category of database unit test is the feature tests listed in Table 7 and Appendix 2, which ensure that either the expected results were returned, or the appropriate behaviour occurred.

Table 7. The feature unit test cases.

Unit test case SPARQL query Expected value

testIfhasSWIDandProductTitle() checks if CISCO_IOS_12_4 has SWID Cisco_IOS and Cisco_IOS has the Product title as a string.

ASK{:CISCO_IOS_12_4:

hasSWID :Cisco_IOS.

:Cisco_IOS

:ProductTitle\"Cisco IOS\"^^xsd:string

true

testIfhasPhysicalElement()

checks if CISCO1841_002 has the Cisco_1841_chassis property.

ASK{:CISCO1841_002 :hasPhysicalElement :Cisco_1841_chassis.}

true

testIfhasLocation()

The schema tests listed in Table 8 and Appendix 3 return the expected set of data values and types in the appropriate order. For example, the test ensures that the database actually contains the nine classes that were expected.

Table 8. The schema unit test cases.

Unit test case SPARQL query Expected value

testCountofClasses()

The problem with the CSV format is that there is no single definition, and it is difficult to identify the methods that should be used to create and interpret the file. The CSV file separates the data fields with commas, but the fields themselves can also contain commas or embedded line breaks. CSV parsers may not process such field data or

use quotation marks around the field. Because the fields can have embedded quotation marks, the CSV implementation can also contain escape characters.

5.3.2 Validation

RDFUnit is the test-driven data debugging framework that can be run automatically based on the schema test cases. All test cases are performed as SPARQL queries using the pattern-based transformation method. The validation begins by uploading the ontology as the Turtle file to the RDFUnit Test demo page.[57] When the ontology validation is selected, the validator executes test generators, automatically creates RDFUnit test cases, and executes them for the data set as shown in Table 9.

Table 9. The RDFUnit test report.

Results

Total test cases 123

Succeeded 121

Failed 2 warnings

Violation instances 285

Failed test cases: http://www.w3.org/2000/01/rdf-schema#label is missing proper range

http://www.w3.org/2000/01/rdf-schema#comment is missing proper range

The domain and the range are the two most expressed axioms in ontologies, and thus produce many automated test cases and good test coverage. Thus, the most common errors for all ontologies are generated from the rdfs:domain and rdfs:range cases, but in such test cases alone, the errors cannot classify the ontology as poor.[59] In light of the previous considerations, these errors were considered to have little effect on the quality of the ontology and can be corrected during subsequent iterations.

The final validation step is to confirm that the requirements are satisfied. This activity included presenting the prototype to the domain experts at the meeting and gathering their feedback on the following questions.

Question 1. Does the ontology fully meet the original requirements defined in the