An Ontology-Based Configuration Management Model for Network Devices

(1)

Metropolia University of Applied Sciences Master of Engineering

Information Technology Master’s Thesis

27 November 2020

Harri Valkonen

An Ontology-Based Configuration

Management Model for Network Devices

(2)

PREFACE

Inference, reasoning and ontology are pillars of philosophy. So, what could be a more interesting thesis topic than one that combines the three? Configuration management has proven to be a challenging process in many ways, in part because of the

challenges associated with configuration management databases. In this thesis I attempted to address these challenges on a small-scale using ontologies and logic. In this thesis, I concluded that the easiest way to present an idea is a prototype. The prototype concretizes the solution to the problem and provides a good basis for further discussions. Of course, not everything went as planned, and the schedule, for

example, was delayed for a number of reasons, one of which was COVID-19, making it difficult to meet in person. Anyhow, I learned a lot of from ontology modelling and the use of graph databases, which I am very grateful.

I would like to thank my colleagues at Millog Oy and my instructor Senior Lecturer Sami Sainio, for their valuable comments and insights. In addition, I am grateful to all my superiors who encourage me to study a master's degree in recent years.

" What I cannot create, I do not understand" - Richard Feynman

Riihimäki, 27 November, 2020 Harri Valkonen

(3)

Author Title

Number of Pages Date

Harri Valkonen

An Ontology-Based Configuration Management Model for Network Devices

43 pages + 3 appendices 27 November 2020

Degree Master of Engineering

Degree Programme Information Technology

Instructor Sami Sainio, Senior Lecturer

Configuration Management is a key system engineering practice applied over the system life cycle that allows an organization to identify, control, maintain, and verify the versions of configuration items (CIs). The organization typically stores information about configuration items in the configuration management database (CMDB). Therefore, the effective configuration management requires common vocabulary between people and software systems.

Often, the CMDB applications require considerable customization because the needs of the organization conflict with standard functionalities. Defining templates and relationship types for configuration item records is the responsibility of the organization. This can result in data models that are either very complex or lack essential information. Designing the CMDB that is excessively complex can result in poor maintainability.

The scope of the thesis was to develop the data model and the prototype for managing network device configurations that enables common vocabulary among people and software agents. In addition, the thesis suggests methods to reduce the complexity of the CMDB and improve its maintainability. Demonstration of the use of software agents was excluded and only the first development iteration was included in the thesis.

The presentation prototype developed in the thesis presents that the proposed application is feasible and that its functionality meets the requirements. The prototype utilized Protégé in the ontology development and Maven Eclipse RDF4J in graph database testing. Test- driven ontology development included defining user requirements in the form of competency questions, ontology development, and ontology verification and validation.

The result of the thesis was the ontology-based data model that enables the common vocabulary between people and software agents. The Common Information Model (CIM) Schema is a specification and conceptual framework for describing software and hardware elements that consist of classes, attributes, relationships between these classes, and inheritances. Because the developed ontology is based on the CIM Schema, software agents can communicate with the database directly, while the domain experts can communicate with the database using a spreadsheet. Ontology maintainability was achieved primarily by designing a simple, small, and reusable ontology utilizing best practices. The number of CIs was reduced by using the collective CI pattern for elements that pose little risk to the primary operation of the system. This approach also allows distributed CMDB graph databases close to systems that can be queried using SPARQL federated queries.

Keywords ontology, configuration management, SPARQL, semantic integration, agile software development, graph databases

(4)

Tekijä Otsikko Sivumäärä Aika

Harri Valkonen

Ontologiaan perustuva konfiguraationhallintamalli tietoverkon laitteille

43 sivua + 3 liitettä 27.11.2020

Tutkinto Insinööri (ylempi AMK) Koulutusohjelma Informaatioteknologia

Ohjaaja Sami Sainio, lehtori

Konfiguraationhallinta on järjestelmän elinkaaren aikana sovellettava keskeinen systeemitekniikan käytäntö, jonka avulla organisaatio voi tunnistaa, hallita, ylläpitää ja todentaa konfiguraatioyksiköiden versioita. Organisaatio tallentaa tyypillisesti tietoja konfiguraatioyksiköistä konfiguraatiotietokantaan. Siksi tehokas konfiguraationhallinta edellyttää yhteistä kieltä ihmisten ja ohjelmistojärjestelmien välillä.

Usein konfiguraatiotietokantasovellukset vaativat huomattavasti räätälöintiä, koska organisaation tarpeet ovat ristiriidassa vakiotoimintojen kanssa. Tallennepohjien ja suhdetyyppien määrittely konfiguraatioyksiköille on organisaation vastuulla. Tämä voi johtaa tietomalleihin, jotka ovat joko hyvin monimutkaisia tai joista puuttuu olennaista tietoa.

Konfiguraatiotietokannan ylisuunnittelu voi johtaa huonoon ylläpidettävyyteen.

Opinnäytetyön tarkoituksena oli kehittää tietomalli tietoverkon laitteiden konfiguraation hallitsemiseksi, joka mahdollistaa yhteisen kielen ihmisten ja ohjelmistoagenttien välillä.

Lisäksi työssä ehdotetaan menetelmiä konfiguraatiotietokannan monimutkaisuuden vähentämiseksi ja sen ylläpidettävyyden parantamiseksi. Ohjelmistoagentit ja niiden soveltaminen jätettiin pois työstä ja vain ensimmäinen kehityskierros sisällytettiin opinnäytetyöhön.

Lisäksi työssä kehitettiin prototyyppi, jolla osoitettiin ehdotetun sovelluksen toteutettavuus ja että sen toiminnallisuudet täyttävät vaatimukset. Prototyyppi hyödynsi Protégé ohjelmistoa ontologian kehittämisessä ja Maven Eclipse RDF4J sovellusympäristöä graafitietokantatestauksessa. Testivetoinen ontologian kehittäminen sisälsi käyttäjien vaatimusten määrittelemisen kysymysten muodossa, ontologian kehittämisen, ontologian todentamisen ja validoinnin.

Opinnäytteen lopputuloksena syntyi ontologiaan perustuva tietomalli, joka mahdollistaa yhteisen kielen ihmisten ja ohjelmistoagenttien välillä. CIM-skeema on spesifikaatio ja käsitteellinen viitekehys ohjelmistojen ja laitteiden mallintamiseen luokkien, attribuuttien, luokkien välisten suhteiden ja periytymisen avulla. Koska kehitetty ontologia perustuu CIM- skeemaan, ohjelmistoagentit voivat kommunikoida tietokannan kanssa suoraan ja asiantuntijat voivat käyttää tietokantaa laskentataulukon avulla. Ontologian ylläpidettävyys saavutettiin ensisijaisesti suunnittelemalla yksinkertainen, pieni ja uudelleenkäytettävä ontologia hyödyntämällä parhaita käytäntöjä. Konfiguraatioyksiköiden määrää vähennettiin soveltamalla yhteistä mallia konfiguraatioyksiköille, jotka aiheuttavat vain vähäisen riskin järjestelmän toiminnalle. Lähestymistapa mahdollistaa myös hajautetut lähellä järjestelmiä olevat konfiguraatiograafitietokannat, joista voidaan tehdä kyselyitä keskitetysti SPARQL kielellä.

Avainsanat ontologia, konfiguraationhallinta, SPARQL, semanttinen integraatio, ketterä ohjelmistokehitys, graafitietokannat

(5)

Contents

Preface Abstract Tiivistelmä List of Figures

List of Abbreviations

1 Introduction 1

2 Method and Material 5

3 Current State Analysis 7

3.1 Identifying and Selecting the CIs 7

3.2 Semantic Interoperability 8

4 Theoretical Background 12

4.1 URIs and Triples 13

4.2 Linked Data and Graphs 14

4.3 Reasoners 17

4.4 Graph Databases 18

4.5 Test-driven Ontology Development 21

5 Results and Analysis 22

5.1 Defining the Ontology Requirements 23

5.2 Planning and Implementing the Ontology 25

5.3 Testing the Ontology 34

5.3.1 Verification 34

5.3.2 Validation 38

6 Discussions and Conclusions 41

References Appendices

Appendix 1. Rio parser uploads Turtle file to RDF4J NativeStore Appendix 2. The feature unit test cases

Appendix 3. The schema unit test cases

(6)

List of Figures

Figure 1. Millog Oy is strategic partner of Finnish Defence Forces [1]. 1

Figure 2. The pillars of CM [2]. 2

Figure 3. The example of the SOI breakdown structure and CIs [5]. 3

Figure 4. Potential candidates for the CIs [5]. 3

Figure 5. Design science research method process model [7]. 6 Figure 6. The use of collective CIs patterns reduces the total number of the CIs [8]. 8 Figure 7. The example of the pizza ontology shown as the graph [12]. 9 Figure 8. The ontology consists of eleven classes grouped into four categories [15]. 11

Figure 9. The example of data ambiguity [24]. 12

Figure 10. Schematic representation of classes, properties and individuals [25]. 12 Figure 11. The illustration of the URL, URN, and URI [26]. 13 Figure 12. The illustration of triples notation index.html hasDate 2020-05-27 [21]. 13 Figure 13. The graphical representation of the axioms [21]. 14

Figure 14. The classes overlap by default [25]. 15

Figure 15. The meaning of the subclass [25]. 16

Figure 16. The object property links Bob to Alice [25]. 16 Figure 17. The data property linking "index.html" to the data literal ”2020-05-27” [25]. 16 Figure 18. The domain Pizza has the range PizzaTopping [25]. 17

Figure 19. The example of reasoning [36]. 17

Figure 20. The workflow for the test-driven ontology development [46]. 22 Figure 21. Initial CIs whose content is explained at a general level. 24

Figure 22. The prototype development process. 26

Figure 23. The architecture of the prototype. 27

Figure 24. The class hierarchy. 28

Figure 25. The hierarchy and the relationships between the classes. 29 Figure 26. The domain and the range of the data properties. 30 Figure 27. The Cellfie plugin imports individuals to the ontology. 31 Figure 28. Cellfie generates the axioms automatically. 32 Figure 29. The example of ManagedSystemElement Cisco 1841_003. 33 Figure 30. The example of PhysicalElement 4-port 10/100BASE Ethernet switch. 33 Figure 31. The example of the executed and passed test cases. 35

Figure 32. The JUnit test run results. 36

Figure 33. The polyhierarchy as the diagram. 40

Figure 34. The proposed CMDB architecture. 42

(7)

List of Abbreviations

CI Configuration Item

CM Configuration Management CIM Common Information Model

CMDB Configuration Management Database CSCI Computer Software Configuration Item FRU Field Replacement Unit

HTML Hypertext Markup Language HWCI Hardware Configuration Items IT Information Technology

OQuaRE Ontology Quality Evaluation Framework OWL Web Ontology Language

RDF Resource Description Framework SOI System of Interest

SPARQL SPARQL Protocol and RDF Query Language SPDX Software Package Data Exchange

SWID Software Identification Tag URI Uniform Resource Identifier URL Uniform Resource Locator URN Uniform Resource Name

(8)

1 Introduction

Millog Oy provides lifecycle support services for the Army, Navy and Air Force defence materiel to its strategic partner, the Finnish Defence Forces by participating in materiel management, as shown in Figure 1. Millog Oy is the expert in the special requirements of systems, from product information management to repair and maintenance. In addition, Millog Oy offers lifecycle support services to other government and commercial customers. Millog Oy workshops and offices disperse to 25 locations around Finland and it employs over 1000 professionals.[1]

Figure 1. Millog Oy is strategic partner of Finnish Defence Forces [1].

Configuration Management (CM) is a key system engineering practice applied over the System of Interest (SOI) life cycle. When properly implemented, the CM enables organization to identify, control, maintain, and verify the versions of Configuration Items (CIs). The CM process includes configuration identification, status accounting, change control and configuration audit phases as shown in Figure 2.[2]

(9)

Figure 2. The pillars of CM [2].

Firstly, the configuration identification involves selecting the CIs and their relationships, defining configuration information, and forming configuration baselines. The status accounting process records and reports configuration information. Recording can be manual or automatic using automatic discovery tools. The status accounting identifies necessary cross-references and relationships in stored configuration data to generate dependency reports. The change management process ensures that only approved change proposals are implemented after the evaluation and verification phases.

Information on changes and concessions is documented so that actions and decisions can be traced. Finally, the changes must be reviewed as required and audited with the focus on the changes.[3,4]

The complexity of the system often determines the definition of SOI. The SOI is divided into systems and key elements at different levels of functional design. Elements can be further subdivided into components, parts, and even raw materials. All systems, elements, and components are candidates for the CI, but the SOI is always the highest level of the CI, as shown in Figure 3.[5]

Configuration

Identification Status Accounting Change Control Configuration Audits

CM

(10)

Figure 3. The example of the SOI breakdown structure and CIs [5].

The CIs can vary in complexity, size, and type from the entire service or system, including all hardware, software and documentation, to a software module or hardware component as shown in Figure 4. The CIs consisting only software systems and elements are called the computer software configuration items (CSCI) and accordingly hardware system elements are called the hardware configuration items (HWCI).[5]

Figure 4. Potential candidates for the CIs [5].

Typically, an organization stores information about the CIs in a configuration management database (CMDB). Therefore, the effective CM process requires common

System

Element SOI

System Element

Element CI

CI

CI Component

Part Part

Component Component Component

Locations

Server Room Data Center Office

Hardware

Servers Switches Routers

Devices

Laptops Tablets Smartphones

Software

Applications Databases Virtual Machines Containers Services

Email

Collaboration Data Processing Documents

Release Note User Guide License

(11)

vocabulary between people, organizations, and software systems. The diversity of CMDB applications result in a wide range of products, most of which are sales-oriented and very complex. In addition, the CMDB applications require considerable customization because the needs of the organization conflict with standard functionalities. The CMDB often supports CI record templates in which relationships are freely definable. Defining the templates and relationship types for the CI records remains the responsibility of organization. This can result in data models that are either very complex or lack essential information.[8] Designing the CMDB that is excessively complex can result in poor maintainability. Each CI, especially CSCI, includes related technical reviews, audits, qualification evaluations, formal unit and integration tests, and documentation requirements. Each of these has incurred maintenance costs.[5]

Network equipment companies use several semantic models, terminologies, and meanings to differentiate their products in the marketplace. Network routers and switches often have similar characteristics. Some layer 3 switches can perform IP routing, while large routers may accept switching modules. These overlaps result in data ambiguity in the classification of the device.

The aim of the thesis was to develop a data model and implement a CMDB presentation prototype. The presentation prototype [6] demonstrates the technical solution to the partially unclear problem to convince the domain experts that the proposed application is feasible and that its functionality meets the requirements. In addition, the prototype clarifies the domain experts and management requirements for the future system. In this thesis, the prototype was developed only to acquire information without an immediate intention to develop an application system. In particular, the thesis attempts to answer the following research questions:

1. What is the data model for managing network device configurations that allows common vocabulary among people and software agents?

2. What actions can be proposed to reduce the complexity of the CMDB and improve its maintainability?

Because the research questions were open-ended questions, the scope of the thesis was limited by the following constraints.

(12)

1. Demonstration of the use of software agents was excluded from the thesis, but a spreadsheet user interface was included.

2. Prototype development is likely to be an iterative process, but only the first iteration was included in the scope of the thesis.

The thesis is divided into six chapters. The first chapter outlines the problem and research questions. The second chapter explains research method and the third chapter investigates the current situation. The fourth chapter provides building blocks of a graph database and linked data. The fifth chapter presents the prototype database developed for the evaluation and discusses the results of the evaluation. The final sixth chapter discuss the progress made in addressing the research questions and identifies remaining gaps that require future work.

2 Method and Material

Due to the somewhat ambiguous nature of the research problem, the design science research method [7] seemed to be the most appropriate problem-solving method. The design science research method creates and evaluates information and communication technology artefacts that aim to solve the identified problems iteratively in six consecutive steps in Figure 5. The arrow indicates the nominal process sequence, but in practice the author did not always proceed in sequential order from activity one to six. Since the idea of the study arose from the observation of the problem, the starting point of the study was the problem-centred approach. Firstly, the research problem was identified in the meeting with the domain experts where the problem was defined so that the solution could fully describe the complexity of the problem. The desired functionality of the artefact and its architecture was then determined based on the literature search and finally the actual artefact was created. The demonstration phase includes artefacts use in experimentation, visualization, and testing activities. The evaluation compares the actual findings resulting from the use of the artefact in the demonstration phase with the solution objectives. In addition, interviews and meetings with the domain experts enabled a more detailed search for ideas. The flexibility of this approach allows for the discovery of information that is important to domain experts, but which may not have been previously considered relevant by the author. At the end of the evaluation, it was decided to move to the communication phase and leave improvements for future projects.

(13)

Figure 5. Design science research method process model [7].

The reliability and validity of the study refers to the extent to which the same results can be obtained using the same tools and processes more than once. Measures to ensure the reliability and validity of this thesis include the following:

a) All selected development tools are open source and easily accessible.

b) The appropriate research method has been selected considering the characteristics of the research.

c) Ethical issues, confidentiality and anonymity, were guaranteed in domain expert’s interviews and meetings.

d) Artefact was made publicly available.

Inference

DESIGN &

DEVELOPMENT Artefact OBJECTIVES OF A SOLUTION

What would a better artefact accomplish?

IDENTIFY PROBLEM

& MOTIVATE Define problem Show importance

DEMONSTRATION Find suitable

context Use artefact to solve problem

EVALUATION Observe how effective, efficient

Iterate back to design COMMUNICATION Scholarly publications

Professional publications

Theory

How to knowledge

Analysis knowledge

Disciplinary knowledge OBJECTIVE

CENTRED SOLUTION

PROBLEM CENTRED APPROACH

DESIGN &

DEVELOPMENT CENTRED APPROACH

OBSERVING A SOLUTION

Possible research entry points

(14)

3 Current State Analysis

This chapter summarizes the literature review, which focused on the issues raised in the research questions. From the research questions, it was inferred that the new CMDB prototype is expected to be better in terms of adaptability, maintainability and support for intelligent agents.

3.1 Identifying and Selecting the CIs

Identifying and selecting the CI is an optimization problem. According to ISO / IEC 20 000 standard, the CI is any element that must be controlled in order to provide a service [3,4].

But as the number of the CIs increases, so does the number of interfaces and, accordingly, the complexity of the CMDB. Each interface must be precisely defined and documented, resulting in an increase in the cost of the formal SOI testing program. In addition, the administrative work related to configuration changes increases. On the other hand, if the number of the CIs is too small, each CI becomes very complex and more difficult to maintain, especially if it contains both hardware and software.[5]

Nonetheless, the CI identification and selection process should aim for the highest possible degree of independence between the CIs, while achieving the appropriate level of governance required to manage the SOI life cycle. This also reduces the complexity of the system and its technical documentation.[2]

The use of Fowler analysis patterns may limit the number of the CIs. Fowler defines the analysis patterns as “an idea that has been useful in one practical context and will probably be useful in others”. [9] Collective CI pattern combines components that contain the identical configuration or cannot be configured, such as a keyboard or mouse. The number of the CIs is reduced, but the individual relationships of each component are not documented as shown in Figure 6. The first supercomputer CMDB model mimics the system architecture by creating the CI from each hardware node. Each CI is then linked to the corresponding island, creating more than 9000 individual CIs. However, all nodes contain only two types of hardware, the so-called thin nodes with two eight-core processors and fat nodes with four 10-core processors. Each node can be booted from two software configurations, compute node or login node. Therefore, the nodes share the same software-related events, issues, and changes. As hardware faults in individual nodes are relatively easy to diagnose and nodes are interchangeable, there is little added value in distinguishing individuals. As the result, the second model, which uses the collective CI pattern model, contains only four CIs.[8]

(15)

Figure 6. The use of collective CIs patterns reduces the total number of the CIs [8].

3.2 Semantic Interoperability

Information technology has evolved from tightly coupled local systems and databases to the integration of multiple systems distributed across organizations. Most websites and application programming interfaces are designed for people, which makes it difficult for computer systems to exchange information with unambiguous and shared meaning. [10]

In context of computer and information science, an ontology is a formal description of Fat Node

Island

…

Compute Node 1

…

Thin Compute Node 512

…

Thin Node Island 18

… Thin Node

Island 1

…

Thin Compute

Node 1

…

Compute Node 205

…

Thin Compute

Node

…

Thin Login Mode

…

Fat Login Node

… Fat

Compute Node

… 1. Model

2. Model

(16)

knowledge as a set of concepts within a domain and relationships between them [11].

Ontologies are used to semantically annotate real-world objects, as shown in Figure 7 [12].

Figure 7. The example of the pizza ontology shown as the graph [12].

Ontologies provide methods to identify correspondences between heterogeneous data models [10]. Ontology mapping can be implemented between ontologies so that the semantics between concepts can be matched accordingly. Semantic mapping solves the problem of semantic interoperability. Furthermore, the domain information captured in the ontology provides the foundation for the intelligence of software agents. Table 1 compares the ontology to the standards-based approach in terms of semantic interoperability.[13]

Table 1. Advantages of the ontology approach [13].

Feature Standard-based approach Ontology-driven approach Reusability Application and domain

specific.

Separate ontologies can be defined for different applications and

domains.

Adaptability New standards or upgrades are required to support new technology.

Until the semantics associated with new techniques are fully integrated into existing ontologies, new techniques can be temporarily supported by analysing their relationship to embedded

semantics in existing ontologies.

(17)

Support for intelligent agents

Standards usually define the format, syntax, and structure of information. Their formality supports machine processing, but the lack of semantics does not allow intelligent

processing.

Semantics embedded in ontologies allow agents to formulate intelligent decisions.

Most of the data networks are constructed from equipment supplied by different manufacturers. The semantic interoperability problem exists since devices from the different manufacturers have different functions, interfaces, and programming models.[13] The similarity between the different facts represented by the data models is established by mapping each fact with a common vocabulary. This allows each fact to be augmented with semantics.[12]

Software systems are a set of packages and the dependencies between them. Packages may require special package combinations to operate properly. In addition, new dependencies arise as packages are added and new versions are continuously integrated into the system. Version restrictions determine which version range works with other versions. Various package dependency and version constraints can be coded in the ontology, allowing for automatic configuration reasoning, flexible query, and the organization of documents and package versions.[14]

The ontology based CMDB web application has recently been developed using Protégé [12] and the Apache Jena Framework [16]. The developed ontology consists of ten property links and eleven classes, grouped into four categories: hardware, software, document, and Information Technology (IT) service management as illustrated in Figure 8. The project class was the highest in the hierarchy and the common class of all other ontology classes. In addition, version control was implemented using the IsNewVersionOf property, which links the current and previous versions of any CI.[15]

(18)

Figure 8. The ontology consists of eleven classes grouped into four categories [15].

Reuse of existing ontologies may be the requirement if the system is to interact with applications that already have managed vocabularies or ontologies [17]. The widely recognized DMTF standard for resource description called the Common Information Model (CIM) is the organizationally supported and regularly updated meta-model for complex computer systems. It provides the common definition of the management information for systems, networks, applications, and services, and enables manufacturer specific extensions. The CIM Schema is based on object-oriented and database modelling and consists of separate layers: the core model, the common model, and the user-developed extension schemas.[18] The CIM Schema can be used as a starting point for listing important terms in the ontology [19].

Ontology quality indicators provide an objective, standardized framework for assessing ontology to identify strengths and weaknesses. Some aspects of ontology maintainability are modularity, reusability, and analysability. Firstly, modularity measures the extent to which the ontology consists of discrete entities such that the change in one entity has little effect on the other entities. Then reusability is the degree to which a part of ontology can be used in more than one ontology or in the construction of other ontologies. And

IT service management Software Document

Hardware

(19)

finally, analysability indicates whether the ontology can be diagnosed due to defects or inconsistencies or whether the parts to be modified can be identified.[20]

4 Theoretical Background

Semantic web technologies make it possible to build vocabularies, write data processing rules, and create linked repositories that are implemented using technologies such as RDF [21], SPARQL [22], and Web Ontology Language (OWL) [23]. Data ambiguity as shown in Figure 9 is prevented using Uniform Resource Identifiers (URIs) and ontologies [24]. The ontology is a set of axioms that provide clear logical statements about classes, individuals, and properties, as shown in Figure 10. Axioms refine domain knowledge, for example, Alice hasDog Miloy. Entities encompass all concepts of classes, individuals, and properties.[11]

Figure 9. The example of data ambiguity [24].

Figure 10. Schematic representation of classes, properties and individuals [25].

People Computer

Mouse

Mouse_1

Mouse_2

hasSibling

Dog Person

Bob City

Alice

livesIn

Helsinki Turku

Milou Goofy Classes

Properties Individuals

hasDog

(20)

4.1 URIs and Triples

The URI identifies an abstract or real entity globally and it has two specializations [26].

The Uniform Resource Name (URN) defines the identity of the resource [27], while the Uniform Resource Locator (URL) provides the location of the resource using physical addresses [28]. Each document or program can be identified and referenced by URIs as shown in Figure 11 [26].

Figure 11. The illustration of the URL, URN, and URI [26].

The URI reference is the URI that is extended with an optional fragment identifier at the end. For example, http://www.w3.org/2001/XMLschema#date has URI http://www.w3.org/2001/XMLschema, usually abbreviated as xsd: and the fragment identifier date separated by “#” character. Fragment identifiers are often interpreted as URLs that identify a place in Hypertext Markup Language (HTML) documents, whereas in ontologies it is interpreted as the URI of different resources without a relationship.[21]

All ontology data is described in subject–predicate–object triples notation as illustrated in Figure 12.[21]

Figure 12. The illustration of triples notation index.html hasDate 2020-05-27 [21].

http://www.example.org/index.html#date URL

URN

URI

hasDate

Subject Predicate Object

index.html 2020-05-27

(21)

The subject of this axiom is the web page index.html, which identifies the thing to which the axiom relates. The predicate hasDate determines the property of the subject.

Finally, the object 2020-05-27 identifies the thing that is the value of the property.[21]

4.2 Linked Data and Graphs

Axioms can be represented as a graph with the subject and object represented as nodes and the predicate as an arc, as shown in Figure 13. The URI names the links, both the subject and the object at the end of the link.[21]

Figure 13. The graphical representation of the axioms [21].

The ontologies are assumed to represent partial information, while in relational databases, information is implicitly considered complete, as shown in Table 2. The assumption of an open world means that everything unknown is not assumed to be false, but simply unknown.[25]

http://www.example.org/index.html

http://www.example.org/contactID/12345 http://purl.org/dc/elements/1.1/creator http://www.example.org/terms/date

”2020-05-27”^^xsd:#date

Alice

http://purl.org/dc/elements/1.1/contributor

(22)

Table 2. The assumptions of the open and closed world [25].

Open world Closed world

Spicy topping and tomato topping are vegetarian toppings. Meat topping is not vegetarian topping.

Is spicy topping vegetarian topping?

Is meat topping vegetarian topping ?

Is jalapeño topping vegetarian topping?

From the example of the vegetarian pizza above, it can be inferred that the open world assumption result in overlapping classes by default, as shown in Figure 14. This means that an individual pizza can be both SpicyPizza and VegetarianPizza at the same time. In situations where the classes should not overlap, they must be explicitly disjoint using the disjoint axiom. Disjoint denotes that individuals in this class cannot be individuals in another class at the same time.[25]

Figure 14. The classes overlap by default [25].

A subclass means that if VegetableTopping is the PizzaTopping subclass, then all VegetableTopping instances are without exception PizzaTopping instances as shown in Figure 15 [25].

VegeterianPizza SpicyPizza

?

(23)

Figure 15. The meaning of the subclass [25].

The properties are divided into object properties and datatype properties. The object property links the individual to another individual as shown in Figure 16.[25]

Figure 16. The object property links Bob to Alice [25].

The datatype property according to Figure 17 must correspond to a conceptual framework that unambiguously defines both the string value space and the lexical space.

For example, the XML Schema datatype xsd:date value space is a set of dates. Lexical space defines 2020-05-27 to be properly written. Then the lexical-value mapping of xsd:date determines that 2020-05-27 represents the date May 27, 2020. XML schema data types are treated in the same way as any other data type, but they are widely used and therefore may be compatible with different software.[25]

Figure 17. The data property linking "index.html" to the data literal ”2020-05-27” [25].

Properties link individuals from the domain to the individuals from the range. For example, in Figure 18, the hasTopping property links individuals in the Pizza class to the individuals in the PizzaTopping class. The domain of the hasTopping property

TomatoTopping PizzaTopping VegetableTopping

Alice hasSibling

Bob

”2020-05-27”^^xsd:date hasDate

index.html

(24)

is Pizza and the range is PizzaTopping. The hasTopping property may have an inverse isToppingOf property.[25]

Figure 18. The domain Pizza has the range PizzaTopping [25].

RDF/XML [29] provides a normative syntax for writing and exchanging graphs.

RDF/XML files are often serialized in a compact and human-readable format such as Turtle [30], RDF/JSON [31], N-Triples [32], TriG [33], Notation 3 [34] and N-Quads [35].

4.3 Reasoners

The process by which inference calculation is used to derive implicit information from explicitly given axioms and facts is called reasoning. For example, as shown in Figure 19, if B is the subclass of A and C is the subclass of B, it can be inferred that C is the subclass of A if the property of the subclass is transitive.[36]

Figure 19. The example of reasoning [36].

PizzaTopping Pizza

hasTopping isToppingOf

B A

If B subclass of A

C If C subclass of B

Then C is subclass of A

(25)

The main advantage of using the ontologies is the ability to use automatic reasoning and thus discover information that is difficult to identify otherwise. In addition, reasoning reduces the redundant data and finds errors or inconsistencies in the ontology. The reasoners utilize first-order logic and either forward or backward chaining strategies. The forward reasoners start from the axions and facts and tries to reach a conclusion using the rules, while backward reasoners start from the conclusions and tries to find the facts and axioms that support the conclusion. There are several reasoners available such as Racer, FaCT++ and HermiT, which can be used as a plugin to Protégé.[37]

4.4 Graph Databases

Relational databases have been the dominant data storage solution for their efficiency, flexibility, and performance. The relational database model is designed for the specific application or organization, unlike ontologies that result from collective work. Ontologies are independent of implementation and are at a higher level of abstraction than the data models. Ontologies mix schema definition with instances or actual data, while relational databases clearly distinguish schema and data. In addition, the ontology instance definition does not follow rules, unlike the relational databases, which must meet all requirements, such as primary keys, foreign keys, and null rules to ensure data integrity.

Thus, the integrity of the data in the relational database is always guaranteed, but ontologies require check executed by reasoner after each change. Ontologies utilize reasoners to decide which instances belong to the ontology depending on the consistency of defined axioms and inheritance rules. A clear advantage is that reasoners can find new information even if no data is defined, whereas new information can only be obtained from tuples stored in the relational database. Ontology query languages, such as SPARQL, return predicates, graph structures, and triples, while relational database SQL queries produce a table structure. [38] The example SPARQL query in Listing 1 utilizes the Wikidata Query Service [39] to report all movies released on October 13, 2020. The query includes the Wikidata objects (Q) and properties (P) shown in Table 3, and Table 4 lists the results of the query.

(26)

SELECT DISTINCT ?item ?itemLabel WHERE { ?item wdt:P31 wd:Q11424.

?item wdt:P577 ?pubdate.

FILTER(?pubdate = "2020-10-13T00:00:00Z"^^xsd:dateTime)

SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }

}

Listing 1. The SPARQL query to download the list of movies released on October 13, 2020 [39].

Table 3. The meaning of Wikidata objects and properties [39].

Items (Q) and properties (P) Equivalent meaning

Q11424 Film

P31 Wikimedia category

P577 Date or point in time when a work was first published or released

Table 4. The movies released on October 13, 2020 [39].

item itemLabel

wd:Q98537283 Batman: Death in the Family

wd:Q99722030 Seized (film)

Probably the most important feature of SPARQL is the ability to federate queries across different repositories, while SQL does not have a standard procedure for the same.

Instead of retrieving and merging data locally, SPARQL queries are written to delegate part of the query to remote query services.[40] The SPARQL query as shown in Listing 2 retrieve the work of Spanish playwright and poet Lope de Vega by combining data from the RDA Registry (rdaregistry.info) and the Miguel de Cervantes Virtual Library (data.cervantesvirtual.com). The query uses Wikidata item Lope de Vega (Q165257) and property BVMC person ID (P2799). The first three works are listed in Table 5.[39]

(27)

SELECT ?workLabel WHERE { wd:Q165257 wdt:P2799 ?id

BIND(uri(concat("http://data.cervantesvirtual.com/person/", ?id)) as ?bvmcID)

SERVICE <http://data.cervantesvirtual.com/openrdf- sesame/repositories/data> {

?bvmcID http://rdaregistry.info/Elements/a/

otherPFCManifestationOf> ?work . ?work rdfs:label ?workLabel }

}

Listing 2. The SPARQL query to retrieve the list of Lope de Vega works[39].

Table 5. Works by Lope de Vega , only the first three are listed [39].

workLabel

Tercera parte de las comedias de Lope de Vega y otros autores ...

Flor de las comedias de Espana de diferentes autores. Quinta parte / Recopiladas por Francisco de Ávila ...

Parte treynta vna, de las meiores comedias, que hasta oy han salido /

recogidas por el Dotor Francisco Toriuio Ximenez; y a la fin va la Comedia de Santa Madrona, intitulada la viuda tirana, y conquista de Barcelona

The graph database uses nodes, edges, and properties in semantic queries to represent and store data. Nodes typically store information about things, while edges store information about relationships between nodes. Connections allow data to be linked directly to each other and in many cases retrieved with a single operation. Querying the relationships are fast because they are perpetually stored in the database. The performance of the graph database has been found to be better than SQL database when querying complex relationships between heterogeneous data in biological networks. Therefore, the graph database is the preferred option over the SQL and other NoSQL databases to store highly interconnected data.[41]

(28)

4.5 Test-driven Ontology Development

Ontology development is an iterative process, and there is no single correct way to model a domain. Ontology development involves four steps: defining classes in the ontology, arranging classes in taxonomic order, defining properties and allowed values for properties, and finally filling the values of properties and instances.[17] Functional ontological requirements are often written in a controlled natural language in the form of competency questions. Competency questions define the scope and extent of the ontology at the definition phase. The answers to the questions may change during ontology development as the domain experts provide more detailed information to describe the domain.[12,42]

Ontology development and maintenance processes are susceptible to errors. Implicit information can result in erroneous conclusions that can only be corrected by reviewing logical statements. Such review processes can be very complex and labour intensive if the ontology is large. In addition, uncontrolled changes can cause inconsistencies if the meaning of the entity changes, undefined entities are used, or domain and range restrictions are overridden.[43]

Agile development processes define workflows designed to prevent and detect errors and mistakes in the early stages of the development process. Agile practices are characterized by strong customer engagement, iterative and test-driven development, refactoring, and frequent testing.[44] The test-driven development process begins with writing test cases, then executing and debugging the code and verifying it by running test cases. The code evolves iteratively from test cases to final implementation.[45] Similarly, the test-driven ontology development process includes the definition of requirements, the implementation of the logical definitions needed to fulfil the test cases, and finally the verification phases. Figure 6 illustrates the test-driven ontology development workflow divided into definition, testing, implementation, and debugging phases.[46]

(29)

Figure 20. The workflow for the test-driven ontology development [46].

Requirements are defined as positive and negative test cases, each of which is a general class axiom such as SoftwareIdentity SubClassOf ManagedSystemElement.

An example of the negative test case is the axiom PhysicalElement SubClassOf SoftwareIdentity, which states that hardware should not be considered software.

Implementation and debugging phases include adding missing axioms and refactoring if necessary. Due to the complexity of the inferences, debugging is difficult without a proper tool such as the Protégé plugin called OntoDebug.[46]

5 Results and Analysis

This chapter presents the author’s work to answer the research questions. The current state analysis highlighted that the first research question could be answered if the prototype is based on the ontology, as the inherent feature of ontologies provides the common vocabulary between people and software agents. Since the ontology is based on the CIM Schema, the software agents are able to collect the CIM Schema information provided by the network equipment manufacturers and communicate seamlessly with the ontology.

Nevertheless, the second research question on maintainability must be examined in more detail. Therefore, the quality of the ontology was assessed in terms of compliance, modularity, reusability, and analysability to identify strengths and weaknesses using the corresponding sub-questions:

▪ Does the ontology fully meet the original requirements defined in the competency questions?

1. Define

Intended ontology Empty or existing

ontology

2. Test

3. Implement

4. Debug Missing descriptions

Test passed

Test defined

Violeted requirements Start

No new test cases

(30)

▪ Is the ontology intuitive and easy to describe and understand?

▪ Can components be modified without affecting other components?

▪ Can the ontology be easily extended beyond network devices?

▪ How easy it is to detect deficiencies, inconsistencies and components to be repaired?

The definition, planning, implementation and testing of the prototype is described in detail in the following sections. Although the development of the ontology is the iterative process, the scope of the thesis was limited to the first iteration and improvements were left for future projects.

5.1 Defining the Ontology Requirements

The ontology should be as simple as possible so that it is easy to maintain, but it must still meet the requirements and contain enough information to be useful. The SPARQL query language enables federated queries and therefore the CM data can be conveniently divided into several separate graph databases. The number of the CIs can be further reduced by using the collective CI pattern, as some CIs may be excluded from individual traceability. Based on the above, the requirements were defined by the domain experts in the internal workshop, where the scope of the ontology was determined using the CI diagram in Figure 21 and the competency questions in Table 6. The policy for naming the classes is derived from the CIM Schema, but the exact definition of classes may differ from that defined.

(31)

Figure 21. Initial CIs whose content is explained at a general level.

The managed element is the abstract class that provides the common superclass and corresponds to the SOI, while components identified by serial numbers are managed system elements. Examples of managed system elements are software, operating system, hardware, and components. The field replacement unit (FRU) is the manufacturer defined collection of physical elements to support, maintain, or update the managed element at the customer location. The product is the concrete class that combines physical elements and software identity and is acquired as a single entity. The software identity provides descriptive information about the software component for tracking resources and managing installation dependencies while physical elements are concrete controlled elements of the system that have physical manifestation. The Software Package Data Exchange (SPDX) [47] specification is the standard format for identifying components, licenses, and copyrights associated with the software package and software identifier (SWID) [48] is the standardized XML file that contains software product identification and management information.

Managed element

A superclass common to other classes.

Managed system element Software and hardware, identified by serial numbers.

Physical element

A concrete physical object.

FRU

The physical element that the customer maintains.

Software Identity A software component.

Product

Hardware and/or software, acquired as a single entity.

SPDX

Identification of components, licenses and copyrights.

SWID

Software product identifier.

SOI

System

Element

Location

Specifies the location and address.

(32)

Table 6. The initial competency questions.

# Question Answer

1 Is software identity a managed system element? yes 2 Is physical element a managed system element? yes

3 Is software identity a physical element? no

4 Is location a managed system element? yes

5 Is product a managed element? yes

6 Is FRU a managed element? yes

7 Is managed system element a system element? yes

8 Is location a managed system element? yes

9 Is product managed element or managed system element? managed element 10 Is FRU managed element or managed system element? managed element

5.2 Planning and Implementing the Ontology

The prototype development process in Figure 22 is based on the test-driven ontology development process. The process includes defining user requirements in the form of competency questions, test-driven ontology development, testing the graph database, and answering the competency question.

(33)

Figure 22. The prototype development process.

The prototype was implemented using two frameworks, as shown in Figure 23. Protégé was utilized for the ontology development and Eclipse RDF4J [49] for graph database testing. Protégé is a set of tools for building domain models and knowledge-based applications with ontologies. Apache Maven [50] is a software project management tool based on the concept of a project object model and Eclipse RDF4J is the Java framework for handling and processing RDF data, including creation, parsing, scalable storage, inference, and query with RDF and linked data. Cellfie [51] is the Protégé extension for creating ontologies from spreadsheets, while OntoGraf [52] and VOWL [53] provide support for interactive navigation and visualization of ontology relationships. HermiT is the first publicly available OWL reasoner based on the hypertableau calculation [54].

RDF4J native store is the transactional graph database that uses direct disk IO, making Naming convention

Selected classes and properties

1. Define

Intended ontology Empty or existing

ontology

2. Test

3. Implement

4. Debug Missing descriptions

Test passed

Test defined

Violeted requirements Start

No new test cases

CIM schema

Competency questions

Evaluation of competency

questions JUnit tests RDFUnit tests

(34)

it a more scalable solution than memory storage, with a smaller memory footprint, and providing better integrity and durability. Finally, GitHub [55] provided version control and source code management for the development project.

Figure 23.The architecture of the prototype.

The starting point for the development of the ontology was the definition of classes using the CIM Schema classes and competency questions. The class hierarchy was then developed from top to bottom assuming “is-a” relationship between classes as shown in Figure 24. It is important to note that all subclasses of the class inherit the properties of that class. Defining the primitive class hierarchy was trivial because it accurately reflects the competency questions and hierarchies of the CIM Schema classes. Nevertheless, the two issues required more reflection and were not entirely self-evident. Firstly, the Location class was moved down in the hierarchy because the ManagedSystem to which it was originally connected was too abstract level to be associated with the location information. Secondly, another cumbersome class was the Product because its information content overlaps somewhat with other information systems. Despite the fact

Individuals MS Excel file

Rio: RDF I/O Parser/Writer API

Repository API SAIL Access

SPARQL

SAIL API Storage and Inference Layer Native Store

Test Cases

HermiT OWL Reasoner Cellfie-plugin

VOWL &

OntoGraf Visualization

Entities Axioms Individuals

OntoDebug Test Cases

Turtle file

SPARQL query results CSV file

(35)

that Product may be unnecessary, it remained in the ontology for the time being because the warranty date was considered important information and later the expiration date of the software license could be added to the class. It is noteworthy that owl:Thing is a concept that ranks at the top of all other classes and must, by definition, be at the top.

Figure 24. The class hierarchy.

The choice of domains and regions was a balancing measure, as the most generic classes should be the domain or the region, but in other hand they should not be too general to encapsulate data. Because there was no prior art for defining domains and regions, they were selected to be identical to the class hierarchy, as shown in Figure 25.

In practice, this means that, for example, the SoftwareIdentity domain describes the SWID range using the hasSWID property. Because only one property was defined for each individual, no restrictions needed to be defined, and as the result, for example, SoftwareIdentity may exist without SWID. The disjoints are inherited down the class tree so that, for example, the SWID cannot be the PhysicalElement because its superclass, SoftwareIdentity, has been disjointed from the PhysicalElement.

(36)

Figure 25. The hierarchy and the relationships between the classes.

The competency questions did not specify data property values, so they were adapted from the CIM Schema as shown in Figure 26. Some of the data properties considered unnecessary were excluded, because the information does not need to be included for foreseeable applications. The remaining data properties were assigned to the highest possible domain, for example, the Name has the domain ManagedElement and the range xsd:Name, because the Name is the common data property for all classes. This reduced the number of redundant data properties among the classes. The property range was selected from standard XML and RDF schemas abbreviated to xsd and rdf, respectively.

Location FRU

ManagedSystem Element

Product

ManagedElement

SoftwareIdentity

SWID SPDX

PhysicalElement

hasLocation

subclass of

hasSWID

subclass of subclass of

subclass of

hasProduct hasFRU

hasSPDX hasManagedSystemElement

hasPhysicalElement

hasSoftwareIdentity subclass of

subclass of disjoint

disjoint

(37)

Figure 26. The domain and the range of the data properties.

(38)

Populating the ontology with individual’s data can be cumbersome if done manually.

Therefore, it is usually left to software agents which collect the data and upload it to the appropriate graph databases. The scope of the thesis was limited to the manual entry of individuals, and therefore they were imported from a spreadsheet using the Cellfie Protégé plugin. The first task is to create transformation rules as illustrated in Figure 27.

The domain-specific language defines mappings from spreadsheet, where Products and Locations are separated into different sheets. In particular, literal data types must be defined, as their absence would result in ambiguity in the data. Once all the conversion rules have been written, the axioms are generated by clicking the generate axioms button at the bottom. Cellfie automatically generates the axioms and displays a preview as shown in Figure 28.

Figure 27. The Cellfie plugin imports individuals to the ontology.

(39)

Figure 28. Cellfie generates the axioms automatically.

The core question of the ontology is how to choose the right class of individuals, for example, a Cisco 1841 router can be both the PhysicalElement and the ManagedSystemElement. The main difference between these classes is that the ManagedSystemElement always distinguishes individuals based on serial number.

The answer can be obtained from configuration risk analyses by identifying critical CIs that pose a high risk of SOI unavailability. The Cisco 1841 router shown in Figure 29 is critical to SOI availability because the primary function of router is to reliably transfer data and the failed router can slow or interrupt data traffic completely. In contrast, the hardware failures of the 4-port 10 / 100BASE_T_Ethernet switch cards shown in Figure 30 are relatively easy to diagnose, the cards are easily replaceable, and the failure affects the limited amount of communication. As a result, the collective CI pattern model can be applied to the interface card, defined as the PhysicalElement, and thus reduce the number of the CIs. Despite these decisions, it may change in the future because the ontology is expected to evolve iteratively over the life cycle of the SOI.

(40)

Figure 29. The example of ManagedSystemElement Cisco 1841_003.

Figure 30. The example of PhysicalElement 4-port 10/100BASE Ethernet switch.

The ontology as RDF/XML and Turtle files is available in version 2.0 [56], which is used in this thesis, under the MIT license.

(41)

5.3 Testing the Ontology

Incorporating testing into ontology development improves the quality of ontologies in terms of their completeness and aligns them with standardized ontologies and standards.

Furthermore, the testing process is useful for ontology engineers because it supports ontology verification and validation activities. Validation ensures that the ontology meets the requirements of the domain experts and other identified stakeholders, while verification assesses whether the ontology meets the specifications. The competency question summarizes the requirements of the domain experts and determines the scope and preliminary specification of the ontology.

The specifications obtained from the competency questions have been supplemented with the requirements set by the CIM Schema, which enables the compatibility of the ontology with software agents and other information systems. Test cases and axioms executed during the development of the ontology, as well as specifications formulated as unit test cases, verify that the implementation meets the specification. Because the prototype is only intended to demonstrate the feasibility of the chosen solution, the initial test cases are not exhaustive, and the coverage of the tests is low. Subsequent iterations, which are expected to improve the ontology, will also increase the number and improve the quality of the test cases.

The first step in the ontology validation is to load the ontology into the validation tool RDFUnit [57], which automatically creates and executes test cases. The validation activity is completed when the competency questions are answered by comparing the classes implemented in the ontology with the questions and the feedback of the domain experts has been analysed.

5.3.1 Verification

Once the classes have been defined, the first test cases can be written to resolve, and correct inconsistencies and incoherencies as shown in Figure 31. Entailed test cases are expected to be true whereas non-entailed are expected to be false. Both entailed and non-entailed test cases ensured that the competency questions are fully implemented.

OntoDebug identifies axioms responsible for inconsistencies and incoherencies in faulty ontology by applying interactive ontology debugging. This iterative process reduces the set of possible faulty axioms until the final set of faulty axioms is identified. OntoDebug also provides the repair interface to help repair faulty axioms by deleting or editing them.

(42)

Initially, the test cases were trivial, but as the ontology expanded, increasing number of errors occurred in the definition of domains, ranges, properties, and literals, and their identification became difficult without assistance of the test cases.

Figure 31. The example of the executed and passed test cases.

Once the ontology was consistent and coherent and all entities were defined, it was exported as the Turtle file and uploaded to the RDF4J NativeStore using the Rio parser for RDF, as listed in Appendix 1. The feature and schema unit tests written as SPARQL queries cover verifiable functional requirements as listed in Appendix 2 and Appendix 3.

(43)

The results of the JUnit [58] test run is shown in Figure 32. The Maven project including the unit test cases is available in version 2.0 [55], which is used in this thesis, under the MIT license.

Figure 32. The JUnit test run results.

Probably the most common category of database unit test is the feature tests listed in Table 7 and Appendix 2, which ensure that either the expected results were returned, or the appropriate behaviour occurred.

Table 7. The feature unit test cases.

Unit test case SPARQL query Expected value

testIfhasSWIDandProductTitle() checks if CISCO_IOS_12_4 has SWID Cisco_IOS and Cisco_IOS has the Product title as a string.

ASK{:CISCO_IOS_12_4:

hasSWID :Cisco_IOS.

:Cisco_IOS

:ProductTitle\"Cisco IOS\"^^xsd:string

true

testIfhasPhysicalElement()

checks if CISCO1841_002 has the Cisco_1841_chassis property.

ASK{:CISCO1841_002 :hasPhysicalElement :Cisco_1841_chassis.}

true

(44)

testIfhasLocation()

checks if CISCO1841_001 location is M1_Riihimäki

ASK{:CISCO1841_001 :hasLocation

:M1_Riihimaki}

true

The schema tests listed in Table 8 and Appendix 3 return the expected set of data values and types in the appropriate order. For example, the test ensures that the database actually contains the nine classes that were expected.

Table 8. The schema unit test cases.

Unit test case SPARQL query Expected value

testCountofClasses() calculates the ontology classes and saves them in the test.csv

SELECT DISTINCT

(strafter(str(?s), \"#\") as

?fragment)

WHERE {?s a owl:Class}

9

testReadCSVFile() check if the test.csv created in the previous test contains 9 classes and the header

none 10

testCountofProduct() counts the number of products in the ontology

SELECT ?s WHERE

{?s rdf:type :Product }

3

testCountofFRU() Count the number of MEM1800-64CF cards connected to the network devices.

SELECT * WHERE {?s ?y :MEM1800-64CF}

2

The problem with the CSV format is that there is no single definition, and it is difficult to identify the methods that should be used to create and interpret the file. The CSV file separates the data fields with commas, but the fields themselves can also contain commas or embedded line breaks. CSV parsers may not process such field data or

(45)

use quotation marks around the field. Because the fields can have embedded quotation marks, the CSV implementation can also contain escape characters.

5.3.2 Validation

RDFUnit is the test-driven data debugging framework that can be run automatically based on the schema test cases. All test cases are performed as SPARQL queries using the pattern-based transformation method. The validation begins by uploading the ontology as the Turtle file to the RDFUnit Test demo page.[57] When the ontology validation is selected, the validator executes test generators, automatically creates RDFUnit test cases, and executes them for the data set as shown in Table 9.

Table 9. The RDFUnit test report.

Results

Total test cases 123

Succeeded 121

Failed 2 warnings

Violation instances 285

Failed test cases: http://www.w3.org/2000/01/rdf-schema#label is missing proper range

http://www.w3.org/2000/01/rdf-schema#comment is missing proper range

The domain and the range are the two most expressed axioms in ontologies, and thus produce many automated test cases and good test coverage. Thus, the most common errors for all ontologies are generated from the rdfs:domain and rdfs:range cases, but in such test cases alone, the errors cannot classify the ontology as poor.[59] In light of the previous considerations, these errors were considered to have little effect on the quality of the ontology and can be corrected during subsequent iterations.

The final validation step is to confirm that the requirements are satisfied. This activity included presenting the prototype to the domain experts at the meeting and gathering their feedback on the following questions.

(46)

Question 1. Does the ontology fully meet the original requirements defined in the competency questions?

The test cases in Figure 31 were compared to the competency questions in Table 10 and it was concluded that the ontology has been developed according to the specification.

Table 10. The list of the validated competency question.

# Question Answer Correct?

1 Is software identity a managed system

element? yes yes

2 Is physical element a managed system

element? yes yes

3 Is software identity a physical element? no yes

4 Is location a managed system element? yes yes

5 Is product a managed element? yes yes

6 Is FRU a managed element? yes yes

7 Is managed system element a system

element? yes yes

8 Is location a managed system element? yes yes

9 Is product managed element or managed

system element? managed element yes

10 Is FRU managed element or managed system

element? managed element yes

Question 2. Is the ontology intuitive and easy to describe and understand?

The ontology consists of only nine classes, and the class tree is small and intuitively understandable, as noted in the class definition in Figure 21. However, the open world assumption, possible disjoints and restrictions between classes and individuals are difficult to understand without the help of examples and illustration. In particular, identifying the difference between different restrictions between individuals, such as hasSWID someValuesFrom SoftwareIdentity, instead of for example hasSWID allValuesFrom SoftwareIdentity, can be confusing as the ontology expands.

(47)

Question 3. Can components be modified without affecting other components?

Because the ontology does not contain polyhierarchies, as shown in Figure 33, modifying an entity does not have unexpected effects on other entities. In large ontologies, the polyhierarchy is often inevitable, and therefore the large ontology should be divided into multiple ontologies whose data are extracted and merged by federated SPARQL queries.

Figure 33. The polyhierarchy as the diagram.

Question 4. Can the ontology be easily extended beyond network devices?

The CIM Schema is the conceptual framework for describing computer elements defined as CSCI and HWCI. The ontology is based on the vocabulary and classes of the CIM Schema, so the ontology can be easily extended beyond network devices to other software-intensive systems.

Question 5. How easy it is to detect deficiencies, inconsistencies and components to be repaired?

Although the ontology attempts to follow good design practices, such as avoiding polyhierarchy, and classes form the tree, it is very likely that some inconsistencies, misrepresentations, deficiencies, and incompatibilities remain. Most or all of these failures occur at some point in the actual context of using the ontology. Some failures can be avoided with the test-driven ontology development and unit testing processes, but they do not guarantee flawlessness.

Class 2

Subclass Class 1