Distributed orchestration framework for fog computing

(1)

Erasmus Mundus Master’s Program in Pervasive Computing & Communications for Sustainable Development (PERCCOM)

Amir Rahafrouz

DISTRIBUTED ORCHESTRATION

FRAMEWORK FOR FOG COMPUTING

Supervisors:

Assistant Professor Karan Mitra (Luleå University of Technology) Assistant Professor Saguna (Luleå University of Technology)

Chaired Professor Christer Åhlund (Luleå University of Technology)

Examiners:

Professor Eric Rondeau (University of Lorraine)

Professor Jari Porras (Lappeenranta University of Technology)

Associate Professor Karl Andersson (Luleå University of Technology)

(2)

sustainable development.

This thesis has been accepted by partner institutions of the consortium (cf. UDL- DAJ, n°1524, 2012 PERCCOM agreement).

Successful defence of this thesis is obligatory for graduation with the following national diplomas:

• Master in Master in Complex Systems Engineering (University of Lorraine)

• Master in Pervasive Computing and Computers for sustainable development (Luleå University of Technology)

• Master of Science in Technology (Lappeenranta University of Technology)

(3)

LUT University

School of Engineering Science

Erasmus Mundus Masters in Pervasive Computing and Communications for Sus- tainable Development (PERCCOM)

Amir Rahafrouz

Distributed Orchestration Framework For Fog Computing Master’s Thesis

106 pages, 45 figures, 5 tables.

Keywords: Edge Computing, Fog Computing, IoT, FiWare, Context, Cloud Computing, Task Offloading, Distributed Monitoring.

The rise of IoT-based system is making an impact on our daily lives and environment.

Fog Computing is a paradigm to utilize IoT data and process them at the first hop of access network instead of distant clouds, and it is going to bring promising applications for us. A mature framework for fog computing still lacks until today. In this study, we propose an approach for monitoring fog nodes in a distributed system using the FogFlow framework. We extend the functionality of FogFlow by adding the monitoring capability of Docker containers using cAdvisor. We use Prometheus for collecting distributed data and aggregate them. The monitoring data of the entire distributed system of fog nodes is accessed via an API from Prometheus. Furthermore, the monitoring data is used to perform the ranking of fog nodes to choose the place to place the serverless functions (Fog Function). The ranking mechanism uses Analytical Hierarchy Processes (AHP) to place the fog function according to resource utilization and saturation of fog nodes’ hardware.

Finally, an experiment test-bed is set up with an image-processing application to detect faces. The effect of our ranking approach on the Quality of Service is measured and compared to the current FogFlow.

(4)

I want to express my sincere gratitude to my supervisors: Professor Karan Mi- tra, Professor Christer Åhlund, and Professor Saguna, for their valuable feedback during the way and all of the discussions that cleared the path. I appreciate their time for helping me and enlightening the path for me.

I would like to thank Professor Karl Andersson, for his heartwarming support, especially during the last semester in Skellefteå.

I would appreciate the help and comments from Dr. Bin Cheng from NEC Labora- tories Europe. His guidance on the technical aspects was invaluable, and I could not finish this thesis without his help.

The research reported here was supported and funded by the Erasmus Mundus Joint Master’s Degree (EMJMD) in Pervasive Computing and COMmunications for Sustainable Development (PERCCOM) [73]. The authors would like to express their gratitude to all the associate partners, sponsors, and researchers of the PERCCOM Consortium.

Last but not least, I would like to give special thanks to my marvelous friends in PERCCOM for making this master program an extraordinary experience: An, Anastasiia, Anisul, Askar, Daniel, Darren, Feruz, Florian, Ijlal, Jose, Krishna, Mal- iha, Mansour, Meru, Orsola, Sami, Sunnat, Valeria. Merci Kheili!

(5)

ABSTRACT ii

ACKNOWLEDGEMENTS iii

LIST OF FIGURES viii

LIST OF TABLES ix

1 Introduction 1

1.1 Background . . . 1

1.2 Motivation . . . 1

1.2.1 Motivating Scenario . . . 3

1.2.2 Sustainability Motive . . . 4

1.3 Problem Definition . . . 6

1.3.1 Research Questions . . . 6

1.3.2 Research Aim . . . 6

1.4 Delimitation . . . 7

1.5 Contribution . . . 8

1.6 Thesis Organization . . . 9

1.6.1 Research Methodology Overview . . . 9

2 Related Work And Background 11 2.1 Introduction . . . 11

2.2 Edge Computing . . . 12

2.2.1 Fog or Edge . . . 13

2.2.2 PaaS for Fog Computing . . . 13

2.3 Orchestration . . . 14

2.3.1 Standardizing Computation and Communication . . . 15

Virtualization Types . . . 15

Next Generation Service Interface . . . 16

2.3.2 Monitoring . . . 16

2.3.3 Decision-Making Framework . . . 17

(6)

2.4 Sustainability Aspect . . . 19

2.5 Fog Computing Platforms . . . 20

2.5.1 Fog Computing in IoT . . . 21

2.5.2 Platforms . . . 21

2.5.3 GeeLytics . . . 23

Other Analytics Platforms . . . 24

2.5.4 FogFlow . . . 24

Context Management . . . 25

Subscription . . . 26

Create an IoT Device . . . 27

2.6 FogFlow as the Selected Platform . . . 27

2.6.1 Overall Architecture of The FogFlow Components . . . 29

Master (Cloud) . . . 29

Designer (Cloud) . . . 29

Discovery, DB(Cloud) . . . 30

RabbitMQ (Cloud) . . . 30

Broker (Cloud, Edge) . . . 30

Worker (Edge) . . . 31

Operator (Edge) . . . 31

2.6.2 Serverless Task: Fog Function . . . 31

2.7 Prometheus . . . 32

2.8 Conclusion . . . 34

3 Theory and Conceptual Design 35 3.1 Monitoring Edge Nodes . . . 35

3.1.1 Availability of Metrics . . . 35

3.1.2 Metrics in the system . . . 36

Four Golden Signal . . . 36

USE Method . . . 37

RED Method . . . 38

3.1.3 Metric Selection . . . 38

3.2 Ranking Fog Nodes . . . 40

3.3 Problem Definition . . . 40

3.4 Analytical Hierarchy Process . . . 41

(7)

4 System Design and Implementation 44

4.1 Monitoring: LogFlow . . . 44

4.1.1 cAdvisor (LogFlow on the Edge) . . . 46

4.1.2 Prometheus (LogFlow in the Cloud) . . . 47

Configuring Prometheus . . . 47

4.1.3 Other Alternatives . . . 48

4.1.4 Deployment as a FogFlow Component . . . 49

Deploying cAdvisor . . . 49

Deploying Prometheus . . . 49

4.1.5 Communication between Master and Edge Node . . . 50

4.1.6 User-defined Metrics . . . 51

4.2 Ranking . . . 51

4.2.1 Ranking Implementation . . . 52

Integration to the FogFlow . . . 53

4.3 Application (FaceCounter) . . . 53

4.3.1 Workload Types . . . 54

Video Conversion . . . 54

Counting Faces . . . 54

4.3.2 Shipping as a Container . . . 54

4.4 A Note on Network-related Metrics . . . 55

A Note on Benchmarking . . . 56

4.5 Summary . . . 56

5 Evaluation, Results and Discussion 57 5.1 Test-bed Setup . . . 57

5.1.1 Workload Specification . . . 57

5.1.2 Assumptions & Requirements . . . 58

5.1.3 Experiment Sequence . . . 59

5.1.4 Experiment Objective . . . 62

5.1.5 Deployment Setup . . . 62

5.1.6 Geographical Positioning of Entities and Fog Nodes . . . . 63

5.1.7 Data Gathering . . . 66

Baseline Config . . . 66

Gathered Metrics . . . 66

Evaluation Parameters . . . 66

(8)

5.4 Results . . . 69

5.4.1 Dynamic Function Placement . . . 69

Summary of Experiment . . . 69

Execution Time . . . 69

Network Throughput . . . 71

Effect of T_u on Execution Time . . . 71

5.5 Discussion . . . 73

5.5.1 Sustainability Aspect . . . 74

5.6 Sustainability Analysis of Impacts . . . 74

5.6.1 Technical . . . 75

5.6.2 Economical . . . 75

5.6.3 Social . . . 77

5.6.4 Individual . . . 77

5.6.5 Environmental . . . 78

6 Conclusion and Future Work 79 6.1 Conclusion . . . 79

6.2 Future Work . . . 80

References 81 A Appendix 98 A.1 Cloudwatch Screenshots . . . 98

B Appendix 104 B.1 Full Experiment Results . . . 104

(9)

Figure 1 Motivating Scenario. . . 5

Figure 2 Research Questions. . . 7

Figure 3 Methodology of the Research. . . 10

Figure 4 Edge of the Network. . . 12

Figure 5 FogFlow Overall Architecture. . . 25

Figure 6 NGSI10 Query Context Advanced Provider . . . 26

Figure 7 NGSI10 Query Context Simple Provider . . . 27

Figure 8 NGSI Subscription and Notification of IoT Devices . . . 28

Figure 9 Create an IoT Device . . . 29

Figure 10 Basic FogFlow Components. . . 30

Figure 11 Registering the Fog Function in the Fog Flow system. . . . 32

Figure 12 Trigger method of a fog function based on Contextual Data (Generated by IoT Devices.) . . . 33

Figure 13 Analytical Hierarchy Process . . . 43

Figure 14 Very abstract view of the system. . . 45

Figure 15 LogFlow Architecture. . . 45

Figure 16 docker−compose.ymlat Edge. . . 49

Figure 17 conf ig.jsonat Edge. . . 49

Figure 18 Required changes in theconf ig.jsonfor deployment of Prometheus. 50 Figure 19 Communication Between Edge and Master. . . 51

Figure 20 Prometheus and Master Interaction . . . 52

Figure 21 Benchmarking Request Lifecycle. . . 55

Figure 22 Timing Configurations. . . 59

Figure 23 Sequence of Evaluation Experiment . . . 61

Figure 24 Geographical Map of Fog Nodes and Entities . . . 64

Figure 25 The Environment Setup of Evaluation. . . 68

Figure 29 Sustainability Analysis of Impact. . . 76

(10)

Table 1 Detailed Steps of the Design Science Research Methodology. 10

Table 2 Proposed Metrics for Ranking Edge nodes. . . 39

Table 3 The MCDM problem of ranking edge nodes. . . 42

Table 4 Fog Nodes Coordination . . . 65

Table 5 Entities (Cameras) Coordination . . . 65

(11)

This chapter lays the ground on which this research has thrived. The background, applications, and problems are explained. The rest of the chapter encompasses the main contributions and methodology of the research. In the end, the structure of this document is presented as the outline.

1.1 Background

The ever-increasing number of connected devices forming the Internet-of-Things is bringing about new challenges. Harnessing billions of connected devices ranging from the smallest sensors to the massive computers would require new techniques and perspectives in engineering juxtaposed to the current Internet challenges. Ericsson forecasts that the IoT connections would reach 22.3 Billion by 2024. [35]

To keep up with the diverse demands of various IoT applications such as QoS requirements, latency, privacy, scalability, current cloud-based solutions are not enough. [132] Fog Computing bridges the gap between Cloud Computing and IoT devices by offloading computation from the cloud to an intermediate layer along the spectrum from cloud to the IoT devices. As defined by OpenFogConsortium [92]: ”Fog is a system-level horizontal architecture that distributes resources and services of computing, storage, control, and networking functions closer to the user, anywhere along the continuum from Cloud to Things.”

1.2 Motivation

Data processing using the centralized solutions at datacenters would hold inherent limitations, such as geographical distance. New applications relying on disperse IoT devices would bring about its own challenges.

Each IoT device can process data by itself, or send it to a server to be processed.

(12)

Fog computing is the paradigm that aids us in finding a middle ground: a large number of small data centers at the edge of the network to benefit from unique advantage of IoT devices and tackle various challenges such as: [106]

• LatencyLatency is the primary motivation of fog computing in general. Many IoT applications require a rather low service delay. In 5G standardization,

”ultra-reliable low-latency” delay requirement has defined as 1ms. [79] How- ever, the average transatlantic round trip time between London and New York is 69.93ms in March of 2019 in the Verizon Enterprise network.[67].

Meeting such strict conditions require perfect orchestration of the tasks on the edge nodes.

• Bandwidth Serving requests at the edge of the network would save the bandwidth between edge to the core network. The bandwidth saving would result in not only cost reductions, but also in the amount of CO2 emission by the network devices.

• Availability Cloud services can deploy fallback nodes at the edge of the network in cases of failure in the core network connectivity. This technique is prevalent by Content Distribution Networks (CDN) as of now. Having an instance of service at the edge of the network would increase the availability of the application. Setting priority to tasks would hint the orchestrator to provide the job over edge nodes, proactively. [7]

• Ownership and PrivacyFog datacenters can be owned and administered by the users. That would give users the freedom to choose between vendors and prevent problems such as vendor lock-in. Depending on the application, sensitive data can be processed at the fog node under the user authority instead of being sent to the 3rd party in the cloud for processing. Also, a privacy firewall in fog can prevent sensitive data from being carried out of the network according to user-defined rules. Such rules are negotiable in the SLAs, and the deployment could be done in the edge-computing system. [5]

We mentioned the high-level motivations for a fog computing system. Address- ing such concerns would introduce new technical challenges, such as monitoring different elements over the network, or mechanism of allocating resources to com- putations.

Monitoring resources is the crucial step for a deploying any system. In the densely populated areas with a lot of IoT data and Fog nodes, having an understanding of the system is crucial for any task, whether automated or manual.

(13)

Choosing a fog node to place a task (serverless function) is another open chal- lenging task in a distributed system that we try to focus on this research. An example scenario is illustrated further to provide more insight into the problem.

1.2.1 Motivating Scenario

A typical high-level scenario could be IoT devices (such as sensors) that produce streams of data at the edge of the network, along with other IoT devices (such as actuators) at the same edge of the network, ingesting the data. Such devices together they form a closed control loop. The usual approach would be to process the data at cloud, but the processing logic can be offloaded to the edge of the network. [120]

Possible use-cases of resource-intensive task placement over fog nodes could be found in smart-industry, and real-time monitoring, patient monitoring, self-driving cars, Augmented Reality (AR), Virtual Reality (VR).

The motivation scenario that we design is further deployed as the experimental testbed, and the results are used for evaluation of the design.

To better convey the motivation of this research, we take the scenario of a policeman with a camera as an IoT device. The camera can be attached to the uniform, or the police car. The camera takes picture at a specified rate and uploads to a storage. The camera stream, or ”camera context”, is produced regularly, regard- less of how data is processed.

On the consumer side, a policeman can have a headset as IoT device that is subscribed to notification data as ”alarm context”.

Processing data, more specifically, converting ”camera context” to ”alarm context”

should happen somewhere. Getting an understanding of system via monitoring data, and choose ”where” to process data are the two most important parts that we focus on this thesis. We believe that this process can be transparent to any data (context) producer or consumer.

As another example, a service to count the number of faces in the camera stream or find a specific face in images. Such a service could be used to identify a thread

(14)

or find a lost child. The cloud computing paradigm would incline us toward designing a service. Deploying this service in a distant cloud datacenter for processing may not satisfy the requirements of the application.

The requirements in terms of minimum latency of processing pictures, the bandwidth costs and unstable network condition can make data processing in the cloud, tricky. Moreover, since the ”camera context” contains private data of people’s faces, there might be privacy concerns. In case of attaching a camera to a policeman’s uniform, other limitations such as battery capacity may make it not feasible to have a powerful processing unit attached to a law enforcement officer.

The challenge would be which Fog node should be selected to offload the data processing? In case of an incident, there might be several policemen dis- patched to a small geographical area. What if the fog resources at that specific region are not enough for this amount of data? How to provide the best quality of service with the available resources at hand? In this case, the load could be propagated over different fog nodes in order to prevent saturating resources of a specific fog node.

The selection of fog node is depicted in Figure 1. The data of user should be offloaded to Fog 1 that is geographically closer or to Fog 2 that has more available resources? Which one provides a better quality of service for the user? It is important to note that the user is not involved in the selection process, and the task assignment shall be transparent to the user.

Fog computing was aimed to provide a solution for such scenarios. The solutions could be provisioning several fog nodes at different places in the city, close to the possible policemen geographical area. Most of the challenges of our use-case mentioned earlier could be alleviated by offloading the task to a fog node that is close enough and also fast enough. We quantify our goal by measuring quality of service.

1.2.2 Sustainability Motive

The major motivation of the thesis is about sustainability aspect of this research.

Every IT system would have direct and indirect consequences on people and environment. The effects of the results of this research from a sustainability point

(15)

Figure 1: Motivating Scenario.

of view is discussed in the chapter 5. The framework of analyzing the effect of research is based on the model proposed by the Becker et. al. [9].

• Environmental Fog computing may result in reduction of CO2 emission.

Fog computing by nature is distributed and usually, renewable sources of energy are also scattered. Examples studies in this regard are available in Chapter 2.

• Economical Reduced bandwidth usage would bring about cost savings.

Moreover, Fog nodes can be under administration of users, which gives users freedom to evaluate their options from economical aspect.

• Social & Individual The focus of this research is specifically on improving Quality of Service, which is function of Quality of Experience. Improved QoE can also lead to new use-cases. For example, data protection laws could be different across countries, and there might be various policies regarding handling transborder business data flows [16]. From the individual perspective, we can count cost-savings, and new applications, especially in the privacy protection domain.

(16)

1.3 Problem Definition

The problem was painted partially in the motivation part. The clear research questions and thesis goals are as follows.

1.3.1 Research Questions

As illustrated in figure 2, the main research question is about orchestration in fog computing. To clarify the orchestration term, we focused on two aspects of orchestration:

• How to perform monitoring in a distributed system of heterogenous Fog nodes, and other nodes in the cloud? More specifically, what technology can be used to implement such a monitoring platform, and how can it be integrated into a state-of-the-art fog computing system?

• Using the monitoring data, how to identify a ranking of edge nodes for each task (serverless function) in order to achieve an acceptable quality of service for the task? What method could be used for ranking, and how to implement and integrate it with a state-of-the-art fog computing system?

1.3.2 Research Aim

This thesis aims to develop a system for QoS-aware task placement in a fog- computing environment by monitoring fog nodes and ranking them.

We contribute to the orchestration process by adding QoS metrics to the decision- making process in the context of the IoT system and more specifically, a smart- city. The best fog node is selected using Analytical Hierarchy Processes. The implementation shall be done in a way to be compatible with one of the state-of- the-art Fog computing platforms.

(17)

Figure 2: Research Questions.

1.4 Delimitation

In our case, the ”data processing” is placing resource-intensive tasks (serverless functions) on the appropriate processing node in the scope of a smart city.

The concept of fog-computing is still at early research stages and has not been widely used by the industry. [12]. This can be due to immature frameworks for fog-computing and lack of a widely used IoT framework. FiWARE is picked as the baseline platform for the implementation due to the promising community support.

Several cities are using FiWARE as their IoT platform which appears reassuring.

[113]. However, it is important to state the scope of this thesis on focused on two parts: adding monitoring feature to the fog computing platform, and provide a method QoS-aware task-placement in the fog computing platform. For this thesis:

• We did not develop a fog computing platform from scratch, but instead we built on top of an existing platform called FogFlow.

• For the ranking of fog nodes, we did not perform a comprehensive analysis of the effect of all of the metrics. We provided a solution that outperforms the current solution. A more detailed study is suggested for future work.

(18)

• For the orchestration metrics we used for the ranking, we have focused on the metrics that are related to the hardware of the edge nodes such as CPU, Memory utilization. The scope of this thesis did not allow us to for an in-depth analysis of various network metrics that affect QoS.

• We did not perform a comprehensive comparison of different fog computing platforms. Due to time-constraints, FogFlow was chosen for this thesis.

It is crucial to note that the limitations of FiWARE and FogFlow system would constrain this research as well. We did not build a fog computing platform, but we contributed to an existing platform. Fog-computing applications can be fully developed using IoT devices that are compatible with the FiWARE platform. It is ready for application developers to build using fog-computing paradigm. However, a mature industry-ready solution requires more analysis and development similar to our effort in this thesis.

1.5 Contribution

The outcome of our work is developed on top of the FogFlow system. FogFlow is the platform to process context data of IoT devices. It allows contextual data to be processed, transformed, in form of serverless functions over fog nodes.

We provided monitoring capability to the FogFlow by identifying the alternatives, and deploy the solution.

We also provided the decision-making capability using the Analytical Hierarchy Method (AHP) to choose the appropriate fog node for data processing. The monitoring part is currently integrated into the main branch of the FogFlow project repository from June 2019. With the new version of FogFlow, at the time of writing this thesis, new QoS features are work in progress, and being developed to enable the FogFlow system with the new intent-based programming paradigm with inherent QoS features.

We contribute to the Fog Computing systems, and more specifically, the FogFlow framework by

• By designing and developing a framework for gathering metrics of statistics in a distributed system of fog nodes.

(19)

• By developing an orchestrator extension which enriches the decision making capability of the FogFlow platform through ranking fog nodes, and evaluating it.

1.6 Thesis Organization

This thesis is comprised of 6 parts. Chapter 1 is about background information and main motivations and problem definition. Chapter 2 is about the state-of- the-art literature on the problems. Primary references are explained with more details to lay the ground for the upcoming chapters. The FogFlow system that is used in this research is explained in more detail in Chapter 2. Chapter 3 is about the theoretical basis of the thesis on the monitoring of a distributed system and orchestration algorithms. Chapter 4 revolves around the details about the implementation of extending the FogFlow system and the testbed infrastructure that was provided for the experiments. Chapter 5 is about the attained results and the evaluation method that was employed. We discuss the results in this chapter.

Furthermore, Chapter 6 is, for conclusion future work.

The macroscopic methodology of the research is discussed in this part. However, the detailed recipe of this research is available in chapters 3 to 5.

1.6.1 Research Methodology Overview

We performed this research in the framework of a design science research methodology.[98] This includes the technologies tested. The practical implementation methodology of this research is explained in chapters 3 to 5. The overall process that we followed during this research is depicted in figure 3. The table 1shows the steps of the methodology and significant research milestones at each stage.

(20)

Table 1:Detailed Steps of the Design Science Research Method- ology.

# Activity Process Milestone

1 Problem Identification and Motivation

-Identify Problem -Define Motivation -Limit the Scope -Literature Review

-Defined Scope to use~FiWARE as the IoT Framework -Use FogFlow to build upon

2 Define the objectives for a solution

-Review State-of-the-art -Define Objectives -Investigate tools -Define the Theory

-Identify Edge Ranking~for Task Offloading as the~Core Problem -Design the abstraction of a system for distributed monitoring

3 Design and Development

-Identify the tools -Develop Solution -Integrate with FogFlow -Identify Limitations

-Develop Using ELK stack -Understand the limitations of ELK.

-Refine the Architecture to~use Prometheus and cAdvisor -Generate Deployment~Scripts

4 Demonstration -Apply Theoretical Algorithm.

-Investigate Suitability of Metrics

Use Analytical Hierarchy Process (AHP) as a Multiple Criteria Decision Making Procese

5 Evaluation

-Evaluate the System -Testbed Setup

-Identify Limitations of Experiment -Compare to the Base

Perform Evaluation on AWS

6 Communication

-Publish Results to Github -Merge into FogFlow -Publish a Scientific Paper

github.com/smartfog/fogflow

Figure 3:Methodology of the Research.

(21)

This chapter reflects on the state-of-the-art background and related work that could contribute to this research. We cover literature around the scope of this work, which is IoT systems in the context of the smart cities, and possible applications for this work. Moreover, we review available edge-computing frameworks to choose one as the groundwork of this work. Furthermore, we discuss the literature on the theoretical foundation of this work regarding decision making and orchestration.

2.1 Introduction

The growing number of connected devices produced are changing the way we live, work, interact, socialize. These devices, forming the Internet of Things (IoT), are added to our toolbox to utilize for improving the quality of our lives. The sur- rounding environment is becoming smarter. According to Mark Weiser, one of the renown names in Ubiquitous Computing, the smart environment can be defined as[121]: ”The physical world that is richly and invisibly interwoven with sensors, actuators, displays, and computational elements, embedded seamlessly in the everyday objects of our lives, and connected through a continuous network.” The applications of ubiquitous computing may differ from the implementations that are available at the moment. As in most of the cloud-based applications, data is sent to the data-center for processing, storage, decision-making, and analysis. [133].

However, to address issues regarding latency, bandwidth, privacy, a computational paradigm is required to enable placing computation in close vicinity of IoT devices. It is worth mentioning that there exists extensive research in the field of pervasive computing, and the research is focused on QoS aware task placement on edge nodes by ranking edge nodes. In this chapter first, the primary literature on edge computing and different definitions is covered. Then we review different views on orchestration in cloud computing and IoT systems. Later on, we cover state-of-the-art literature and products on the market and opensource community on logging and metering distributed systems. On the next part, we review the

(22)

Figure 4: Edge of the Network.

available research decision-making to get information from the data we gather using our metering tool. We wrap up by connecting the dots and explaining how our work is going to leverage current research to be a step forward.

2.2 Edge Computing

Edge computing, fog computing, cloudlets, and mist computing are used inter- changeably in some papers. However, ”edge” and ”fog” seems to be the general common term. In telecommunications environment, ”edge” would refer to base stations, RANs, and Internet Service Provider access networks,[81] while in the IoT domain, an edge indicates the network encompassing sensors and other IoT appliances. To accomplish ”edge computing,” the computation shall be arranged near the first hop from the IoT device. If the IoT devices themselves do the computation, the terminology will become ”mist computing.” According to General Elec- tric[122], Edge computing is the technology that is attached to the ”things,” while, Fog Computing revolves around the interaction of edge devices and network elements and gateways. [130][78]

(23)

2.2.1 Fog or Edge

An exhaustive comparison between fog computing, edge computing, mist computing, mobile edge computing, mobile cloud computing, and other paradigms is available at [130].

The lines between different terms in this area are blurry; however, in this note, we prefer to use the term Fog Computing as it is the most general one which encompasses other terms as well.

The main difference that worth noting here is the difference between Fog and edge. If we want to take the formal definition of Fog from OpenFog Consortium [92], then it is more accurate to say that fog computing encompasses the hierarchy of services ranging from Cloud to things. According to [19]: ”fog seeks to realize a seamless continuum of computing services from the cloud to the things rather than treating the network edges as isolated computing platforms.” [130] Our work here also lies in the category of fog computing rather than edge computing.

2.2.2 PaaS for Fog Computing

Based on the definitions at previous section, we can see that the research in fog computing is moving toward enabling a general framework for to hide technical configuration for users and developers. In other words, the vision for fog computing research seems to move towards having the fog platforms as a service. Until now, there is not any stable platform developed to process IoT contextual data seamlessly across the cloud and edge of the network.

Authors in [128] are proposing a PaaS model for fog computing. However, the design requirements are derived based on a specific use-case and cannot fit a general-purpose IoT framework. Research done at [95] is another example with a Raspberry Pi prototype. There have been other papers proposing platforms for Fog Computing with prototypes on different setups. We believe that the concept seems feasible and now is the time to move further toward tackling more fine-grained production challenges of Fog Computing to move from prototypes to systems.

(24)

2.3 Orchestration

The term orchestration would have different meanings based on the context. De- pending on the use-case, different orchestration frameworks are suggested in academia. [72], [77], [50], [129], [96], [25], [75], [104], [114], are some examples. An abstract categorization of orchestration operations that is compatible with current state-of-the-art can be found in [103]. According to Ranjan et. al.[103], orchestration of cloud resources is comprised of:

• Selecting resources (at design and runtime)To decide whether the resource can be selected for the required functionality satisfying specific resource requirements and constraints (for example, geographical location, current utilization, network situation). Subsequently, suitable resources (fog node) is selected to be allocated to software demands.

• Deploying resources (both design time and runtime)This operation com- prises instantiating software resources on cloud and edge services and configuring them for communication and interoperation with other software resources. Announcing a new IoT broker, or a new edge node to the network is a notable example of this orchestration operation in the FogFlow framework.

• Monitoring resources (runtime)Monitoring QoS attributes of applications and hardware resources concerns gathering metrics from different nodes and possibly detecting event patterns (such as a load spike) from the data generated by deployed resources (for example, CPU utilization metrics data).

• Controlling resources (runtime) Based on event monitoring data, a resource orchestrator can react to circumstances in service behaviors and act curative upon it. For example, provision more resources at the Cloud or forward process to other edge nodes to preserve the current QoS level for other users.

One of the essential pillars of orchestration is monitoring since it is the requirement for other actions. Without having real-time feedback information of the system, automatic orchestration is not possible. With correct data, system managers can specify correct policies for the network, and the if any pattern would emerge from tasks, it could be automated.

(25)

2.3.1 Standardizing Computation and Communication

Applications that are going to be deployed on fog nodes shall be considered as black boxes with standard format. The solution developer should focus on the application development, and the underlying complexities of the infrastructure setup should be invisible to the application developer. Virtualization of Operating Sys- tems was the fundamental enabler of cloud computing [15]; however, different types of resources can be virtualized.

Virtualization Types

virtualization-based infrastructureNetwork of Fog nodes can be deployed using SDN for better scalability and lower operational costs. Studies such as [76]

have worked on integrating control plane of network resources into Fog nodes.

[109] is a study on integrating Fog in Radio Access Networks (FRAN). Provision resources at such fine level would be limited to more specific use-cases and would lose generality for the use-case. However, the orchestrator can use the metric and log and events of such a platform to perform higher-level decision making on the ranking of edge nodes.

Virtualization Technology for application development For IoT applications, Docker containers are a more lightweight approach compared to virtual machines, and more suitable in terms of portability compared to custom application development. A container-oriented service provisioning framework is [84]. Their view is that how lightweight containers can help IoT devices leverage the available resources. There are, however, research that goes deeper in customizing embedded systems for IoT. Researchers at such as [23] have proposed a framework called FADES, which uses MirageOS Unikernels for having a lightweight OS. It seems like a promising future for IoT edge computing; however, this paradigm is not mature enough for our work. Other researches regarding performance of containers for IoT systems is available at [84], [82], [83].

(26)

Next Generation Service Interface

One of the major challenges in IoT is to have a unified data model for exchange information between IoT devices. Large scale IoT research projects such as Fi- WARE[42] and WISE-IoT[124] have implemented NGSI standard. NGSI is an open standard interface used by academia and industry. NGSIv1 [90] is used as the data structure in FogFlow, and some other context brokers of FIWARE project.

European Commission[22] In 2016 requested ETSI create an ISG (Industry Spec- ification Group) to establish a standard Context Information Management API with FIWARE NGSI as a candidate. This request was published in the 2016 Rolling Plan for ICT Standardisation. At the beginning of 2017, the ISG was created by ETSI. A preliminary draft of the ETSI NGSI-LD API specification was available in April 2018. [10] FiWARE currently uses NGSVIv2 as the standard for context broker.[43] FogFlow uses NGSIv1 planning to move toward becoming compatible with ETSI NGSI-LD in the future as the target standard for all FIWARE components.

2.3.2 Monitoring

There are different monitoring frameworks available for different systems. Docker containers can be monitored like any other process in the Linux-based systems.

For the sole purpose of getting metrics of Docker containers, Google has provided a tool called cAdvisor. [53]. Another popular framework for monitoring distributed is a set of tools: Elasticsearch, Logstash, Kibana, Beats, are usually referred to as ELK stack. They provide a tool for metric extraction, aggregation, visualization, and search and archival of data. For extracting metrics from edge nodes, the required tool should be lightweight and flexible. Two candidate tools for this work are:

• Beats[32]is a software running on a machine, and extracts metric from other running services of the machine. It can extract application-specific metrics and send them out. Metricbeat is a type of Beat specialized in metrics. It does not do anything with the data. However, it merely gets the data and sends it out to other nodes(usually Logstash) for further processing. As a part of the ELK stack, it can be seamlessly integrated with other components

(27)

such as Logstash and Elasticsearch. It has a module called docker that can extract container-level information to send it to the Logstash instance(s).

• cAdvisor[53] is a docker container that when runs on a machine, gets met- rics from other containers on the same machine and provides an API to be used by other applications. It is more lightweight than Metricbeat

We tested both Metricbeats and cAdvisor, and they both work fine. However, for interoperability with other parts of this system, Prometheus, cAdvisor was chosen because of its lightweight footprint and standard REST API.

2.3.3 Decision-Making Framework

The aggregated metrics are in the form of time-series data and to get insight from them, and a decision-making framework is the essential part of the orchestration to improve the Quality of Service. Using the monitoring data, a method is needed to use the data to improve the QoS. Here we look at two aspects of this problem:

First, we any decision-making algorithm or method needs data to be fed into it. We look at what type of technology is suitable to provide such information in a single API. Second, with the aid of technology to have the data, what method should be used to improve Quality of Service.

Technology

Even the most accurate decision-making algorithm is useless without correct data of the current state of the system. What software technology to use for aggregat- ing data in real-time that can support edge-computing orchestration. The choice depends on the underlying technology for monitoring, to aggregate the data prop- erly. The options that we found suitable for this thesis are:

• Logstash/Elasticsearch: Works well with the Beats modules and can aggregate data from different sources. It can be deployed in the cluster mode as well for a system with many nodes.

• Prometheus is a lightweight time-series database and monitoring tool developed at Soundcloud.[100] It can be integrated with cAdvisor to collect metrics, and using the specific query language, PromQL, it is possible to query time-series datasets.

(28)

• Opencensus: Opencensus is a project comprising of a set of libraries to collect distributed metrics and traces. [91]. It has different exporters for different languages and can be integrated into various monitoring tools such as Prometheus or other products.

For this project, Prometheus is chosen due to its scalability features, mature framework, lightweight footprint compared to Elasticsearch, and interoperability to cAdvisor.

Technique

Using the data, there could be thousands of directions to shed light on the problem of ”Improving Quality of Service.”

There is numerous literature available with different techniques, on how hot perform decision making:

• Ranking Problem: The problem can be the ranking of fog nodes according to the characteristics of the fog nodes using metric data. Multi-Criteria De- cision Making (MCDM) frameworks can be used to rank Edge nodes. Such an approach has been used by [51] to rank the cloud services using the Analytical Hierarchy Process (AHP).

• Optimization Problem: The ranking problem could be viewed as an optimization problem according to the various constraints. FogPlan[131] models the edge nodes QoS as an optimization problem with the constraint being SLA.

• Decision Making in Adversarial Environment: In such works, edge nodes, cloud nodes, and sensors are considered rational and self-interest. In this case, nodes would interact to maximize their self utility. Utility theory is used to[2], [135] to model the interest of Fog nodes, and Normal form Game- theory is used to model the interaction and analyze the equilibria of strate- gies. In such frameworks, some new concepts may be introduced, such as reward, punishment, price, depending on the method used. [71], [85], [115].

Some works such as [52] would work on the fairness of resource allocation without game-theory.

• Decision Making in Cooperative Environment: In contrast to adversarial set- tings, in these papers, nodes would form a coalition to take into account the

(29)

utility of the group rather than individuals. Shapley value [108] of the coalition can be a useful term here to work on a group of edge nodes, which indicates the distribution of reward according to the individual’s contribution.

Such frameworks, however, may not be scalable due to the computation required, and usually, heuristics would be used.

2.4 Sustainability Aspect

Regarding thetechnical benefitsof this research we can say that there are very few studies regarding the energy modeling of fog computing since it is a new area.

However, from previous literature, related work can be categorized into several categories[130]:

• IoT device Federation for energy reducing energy consumption:

• Energy-aware computation offloading:

• Energy-aware mobility Management:

According to a study on the energy impact of fog computing, [68] the number of networks hops between user and data has little impact on energy. Their research has shown that factors such as ”type of access network,” ”fog nodes time utilization,” ”application type” have more energy impact. It shows the importance of application-level task orchestration compared to the physical placement of fog nodes in the network. Hence, we focused on adding the utilization of edge nodes into the orchestration logic. We also chose the Fogflow system as the implementation framework because of its architecture that gives users freedom on the design of their applications for a fog-computing model.

[71] provides a model for inspecting the trade-off of energy consumption and availability using a cooperative game-theoretical framework. Their task placement method is a suitable method that can be applied to our work; however, due to implementation limitations, we chose the AHP method to be the baseline.

Due to the distributed nature of fog computing, it can leverage green energy sources. Renewable energy providers are distributed across the energy distribution grid. Research carried out by Nan et al. [86] presents an analytical framework for incorporating green energy sources as the primary source and grid energy as

(30)

a backup. Their optimization algorithm can be the logic of orchestration based on the use-case. We carried out our research to provide a system using available metric data.

Studies such as [3] have proposed a design to do the control of energy management in a distributed manner over fog nodes. They focused on energy management at house level in a city. According to our scope, their control-as-a-service logic is an example application that can be deployed over fog nodes, and our system tries to orchestrate different applications. [69] is also another example of an activity recognition application using fog model. These works are at the prototype level, and for becoming a real-world application, other aspects such as privacy have to be studied in more depth.

2.5 Fog Computing Platforms

Currently, there exist several IoT Edge computing frameworks from different companies. Amazon Greengrass [8], Azure IoT Edge [66], The Linux Foundation EdgeX, and FogFlow are some. The comparison of edge computing platforms can be summarized as follows:

• Data Model: To accomplish computing over the edge of the network, it is crucial to have a standardized data format to exchange context between entities: From IoT sensors to Edge Computing devices and Cloud resources.

NGSI (Next Generation Service Interface) [126] is the standard for data exchange between IoT services. It provides a unified data model and communication interface, which provides simplicity and interoperability with other IoT platforms in the FiWARE ecosystem. Other systems use Topic-based or raw data, which would not be as flexible as NGSI.

• Service Design Method: The FogFlow services are composed against a global view of all cloud nodes and edge nodes, instead of taking into account the perspective of each edge node. This layer of abstraction would provide simplicity for users when developing and managing the system. The FogFlow system arranges the task of low-level node management and data

(31)

exchange. Other IoT Edge Platform, however, are more static. The applications would be defined by specifying the edge nodes. In FogFlow, a service is encapsulated in a docker container and can be deployed on the Cloud or over the edge node, dynamically, depending on the state of the system, in a distributed manner.

• Triggering Mechanism: Powered by NGSI, Fogflow can be subscribed to context events, which can be automatically derived from the data. However, on other platforms, a topic-based mechanism is used, which requires more development.

• Orchestration View: Fogflow would view all of the nodes, in the edge or the Cloud, from the same eye. Edge nodes can come and announce themselves dynamically to the FogFlow system and process the data according to their capacity and system needs. For other systems, the edge node should be specified explicitly, and a backup node is specified for support in the Cloud.

• Programming Model: In FogFlow, the programming model can be de- scribed as a serverless function over the edge that is enabled by the contextual data to process contextual data. In other frameworks, the programming model is functions to process topic-based data.

Full feature comparison of current edge-computing frameworks can be found in the conclusion part of [130].

2.5.1 Fog Computing in IoT

As discussed in the terminology section, Fog computing is an umbrella term that encompasses all of the terms from things to clouds. Some of the fog computing or edge computing platform are not mature enough to have the full support of all of the technologies. The most relevant works for our research are discussed in this section. We have built our research on top of the following research literature that is discussed further in this chapter.

2.5.2 Platforms

There are various Edge-computing and Fog-computing platforms under development at the moment. Some companies, such as Cloud Providers have also started

(32)

to extend the functionality of their products to support Fog computing. Some other companies also are supporting open-source projects. Since many platforms are under active development, a full comparison of the platform would get outdated in a matter of months. There is no right or wrong choice of platform to choose for contribution to the scope of this research. However, an overall comparison of frameworks is available in this section to give an overall state of the platform at the time of writing. [102] [4].

• Nebbiolo[61] is a product of a startup company. It is a product that is not open-source. It is not an extension of another cloud product. Their products aim at the smart industry. Their solution would run only on specific brands of hardware listed on their website.

• FogHorn Lightning[49] is a product of a company that is not opensource and hence is not open for research development. However, as a product, it comes with support for business use-cases.

• Cisco IOx[20] is a mixture of Cisco IOS [87] and Linux OS. It is a product that comes with the support of Cisco. It requires specialized hardware to operate, and the platform is not entirely open-source.

• IBM Watson IoT [63] is an extension of the IBM Watson product. There- fore, it is a cloud-based system. It might be a powerful solution for large enterprises that use other IBM products for interoperability with other Wat- son products.

• AWS Greengrass[8] is a commercial product as an extension to the AWS cloud products. It is not open-source, but it is a product that comes with support.

• Dell Edge Device Manager[127] Wyse would provide edge device orchestration that handles Dell devices. It is a commercial product that is not open for research purposes.

• OpenStack++ [56] is one of the first projects as a cloudlet that was developed as an extension for the Openstack project using a modified version of QEMU. It is a result of Elijah project [34] at the Carnegie Mellon University.

Some projects built on top of Elijah are available at ”Open Edge Computing Initiative” website. [89]

• Edge Computing Group - Openstack[93] is a working group that working toward enabling Openstack to become ready for challenges of Edge computing. According to the mission statement of the project [30]: ”The OSF Edge

(33)

Working Group will identify use cases, develop requirements, and produce viable architecture options and tests for evaluating new and existing solutions, across different industries and global constituencies, to enable development activities for Open Infrastructure and other Open Source community projects to support edge use cases.”

• ParaDrop [1] is an opensource project, and hence, is open to extension for research purposes. It is a pre-product. It would allow developers to deploy service at WiFi routers or set-top boxes. It is suitable for smart-home applications.

• Microsoft Azure IoT Edge[66] is an opensource project that extends Mi- crosoft Azure Cloud. It is still pre-product.

• Stack4Things[112] is initially developed by the University of Messina, and now is a product that would have the support of SmartME.io Srl. In one of their use-cases [13] they use network function virtualization (Cloud-based) for a smart-mobility application that interacts with a smart city.

• OpenVolcano [94] [14] is a part of Horizon 2020 Input project. [64] It is more oriented toward network layer to be an enabler for 5G infrastructure.

It combines NFV, and SDN using OpenFlow protocol to extend Openstack project.

• FogFlow [110] is an opensource project as a part of FiWARE European project. It enables serverless context processing — more on FogFlow in the section 2.5.4.

2.5.3 GeeLytics

Geelytics: Enabling On-demand Edge Analytics Over Scoped Data Sources is the name of a study on enabling edge analytics and data processing on edge.

[17]. Existing stream processing platforms such as Apache Storm, Heron, S4, Spark Streaming are designed to manage streams in a cluster setup or cloud environments. As an example, the data to be processed in conventional streaming systems can originate from a No-SQL based database such as MongoDB, or HBase, or it can come from a message broker such as Apache Kafka. In IoT systems with highly distributed devices, such designs may not be easily suitable,

(34)

since data source can be a low-powered sensor from an unknown vendor. There- fore, they propose an architecture for a system to automate stream processing over the edge in the IoT context.

They try to fill the technological gap by bringing together suitable technologies and make a system using the RabbitMQ message broker, and several new concepts in their architecture. They provide a method for complex task decomposition to be managed on the fly ever edge nodes. They mention that for future, the Geelytics system can be extended for auto-scaling, mobility, security support.

Other Analytics Platforms

Other platforms are AGT Analytics [65], Edgent, which is incubated in Apache Software Foundation (ASF). [31] From commercialized solutions, we can name Iguazio edge analytics [29], and Cisco Kinetic [21] formerly known as Parstream.

Research at CMU uses image processing on the cloudlets [107].

All of the platforms focus on the stream itself rather than dynamical allocation of analytics over edge nodes. Each platform is getting better day by day by adding new features.

It is essential to mention that a variety of challenges in the Fog Computing/ Edge computing are addressed before the term fog or edge gets coined. Studies such as [97], [134] are example cases, that while not mentioning the name of Fog, they are heading in a similar direction. For example, in [99], the mobile device could be anything, such as IoT device.

2.5.4 FogFlow

While Geelytics architecture was designed to provide a system architecture for edge stream processing, FogFlow system is the concrete software that can be integrated with the FIWARE system to enable edge processing. FogFlow [18] enables easy programming to process IoT data over Cloud and Edge for smart cities.

In FogFlow, the data has a form of NGSI contextual format, and consequently, it can be integrated with other IoT systems that consume or produce data in NGSI standard. The general architecture of FogFlow is depicted in figure 5.

(35)

Figure 5: FogFlow Overall Architecture.

FogFlow has different components, and hereabouts, we demonstrate the basic functionality of primary components.

Context Management

FogFlow would orchestrate fog functions (Server-less functions) which get triggered based on Contextual Events. Context is the standardized global data structure in the FiWARE system, to provide interoperability between IoT devices.

FogFlow is a framework for advanced serverless processing of Context data.

The schema of the data is IoT contextual data in the form of NGSI. Each IoT device would speak to an IoT Broker for publishing data or getting data via NGSI10 interface API. IoT Broker is discoverable via querying IoT Discovery module via an NGSI9 API endpoint. Each IoT Broker would announce itself to the IoT Dis- covery upon startup. The IoT Discovery would store the profile of data, and IoT Broker would store the last contextual value of data. For example, a temperature sensor would have a profile such as the name of the address, sensor type, model, that would be stored in the Discovery module in the database. The sensor would generate contextual data (Temperature every couple of minutes) that would be announced to an IoT Broker. The IoT Broker would store the last value of data.

A more in-depth tutorial on FIWARE context management is available; however, here we take two possible NGSI10 interactions as an example to shed light on the

(36)

Figure 6: NGSI10 Query Context Advanced Provider.

internal interaction of FogFlow components. Such interactions are building blocks of our contribution in the next chapter.

All of the interactions in NGSI would happen through RESTful APIs. The context can be any data in the IoT system. Figure 6 shows a simple query of data when the IoT Broker can query the context provider.

We can see in figure 6 that if IoT device can accept HTTP request to get queried via NGSI interface, the Broker can query the Context from IoT devices upon request. However, for more straightforward context providers, an iterative update model can be used. The context broker would only save the latest context value.

Subscription

It is possible to have a subscription to a specific context. For example, an actua- tor can besubscribedto temperature context. It means that whenever a broker receives data through NGSI interface, the Broker would check the list of sub- scribers to that context, andnotifiesthem. Any IoT device can issue subscription to a contextual data and get notified by the Broker when the specified context gets available. An example subscription sequence is depicted in figure 8.

(37)

Figure 7: NGSI10 Query Context Simple Provider.

Create an IoT Device

To register and IoT device, we can use the web interface (Designer), or an API call. The process is shown in figure 9. A device can issue this registration by itself.

2.6 FogFlow as the Selected Platform

In this part, we give an overall image of different components of FogFlow, and which parts are open to extension for this research.

The FogFlow framework has been developed to be an implementation of the generic enabler of fog-computing in the FiWARE platform. It is interoperable to various IoT device types. For example, the physical layer protocol of IoT sensors is handled by different IoT Agents of FiWARE. The FogFlow framework would access the data through standardized NGSI using REST APIs.

(38)

Figure 8: NGSI Subscription and Notification of IoT Devices.

(39)

Figure 9:Create an IoT Device.

2.6.1 Overall Architecture of The FogFlow Components

Each of these nodes represents a docker container, and are deployed using the docker-compose tool [27]. The general architecture is shown in figure10.

Master (Cloud)

Master is the heart of FogFlow system. It coordinates all of the edge nodes.

The task offloading decisions take place in the Master. Our contribution is to add decision making logic to the Master. According to the insight taken from the Prometheus node, Edge nodes are ranked via an Analytical Hierarchy Process, and the best one is selected to offload the task.

Designer (Cloud)

Designer is the web interface of FogFlow. The orchestration decisions are logged and can be found via browser by contacting Designer address.

(40)

Figure 10:Basic FogFlow Components.

Discovery, DB(Cloud)

NGSI9 is used for discovery availability of the context. An IoT device would contact the discovery node to find the closest Broker and start interacting with the Broker. In a mobile scheme, the IoT application can periodically contact Discov- ery to find the Broker address. The idea is similar to service discovery in the micro-services world.

RabbitMQ (Cloud)

The management communication between Master and workers would take place via the Topic-based Pub/Sub method using the RabbitMQ server. The Master would publish orchestration commands and edge nodes, more specifically, worker nodes are subscribed to the commands and act upon it.

Broker (Cloud, Edge)

NGSI10 context is exchanged via Context Brokers in the FiWARE system. NGSI is the standard, and there are a variety of implementations of the standard. Orion

(41)

[60] is an example broker which can be the port of connection to other FiWare components. It is a European standard [22].

Worker (Edge)

The worker node would interact with the master node in the Cloud and is respon- sible for starting and stopping and configuring containers on edge nodes. It is a part of FogFlow system that interacts with the Master via RabbitMQ protocol.

Operator (Edge)

The user-developed software would be started/stopped dynamically according to master orchestration commands. Our decision-making algorithm would result in final actions in starting the operator on which edge node.

2.6.2 Serverless Task: Fog Function

A Fog function in the fog flow system is a fundamental element of computation. A Fog flow is introduced to the system by an admin or application developer. The Fogflow system would place the fog function on the cloud or edge nodes. An admin would introduce a fog function to the system; however, it gets instantiated on-the-fly according to the contextual data of IoT devices. This approach is also known as server-less computing.

To instantiate a fog function, the following parameters should be defined:

• NameThe name of a fog function is used for reference.

• CodeThe logic of the fog function is the computation that it performs on the data. It is introduced to the system as a Docker container. The application developer can develop a Docker container, and the admin can make a fog function to use the specific docker container for a fog function. The container should have specific NGSI HTTP APIs to able to work in the fog flow system.

• Input Triggers The data is generated in an IoT system as context and is exchanged via Context Brokers. The input trigger is a filter that would define what type of data would be the input to the fog function.

(42)

Figure 11:Registering the Fog Function in the Fog Flow system.

• Group by attributeThe IoT context data would be fed into a fog function.

Fogflow uses the Groupby attribute of data to understand how many in- stances of a fog function should be instantiated on the edge or cloud nodes.

For the example of our use case, each fog function is designed to handle data of one car camera. It means for each camera (IoT device), there should be one docker container instance of fog function to process the camera data.

In our use case, the group-by attribute is car_id.

• Select ConditionsIf there is any specific condition on the input data, it can be specified in this field. The input trigger specifies the data (Context) type that triggers the fog function. In ”select condition,” it is possible to specify criteria on the data (Context) value.

The internal interactions of FogFlow can be found in the documentation, however, A major sequence of events is important to specify here in order to give an understanding. Figure 11 explains the steps taken by the user and internal components of FogFlow to create a fog function. After creating a fog function by the user, and specifying the details, FogFlow would register the fog function and enable it. A fog function will be instantiated if there is input data available. Figure 12 explains the interaction that occur when a fog function is triggered by IoT data.

2.7 Prometheus

Prometheus [100] was originally developed by SoundCloud [111] engineers, and then it was incubated as a ”Cloud Native Computing Foundation” [62]. Project.

(43)

Figure12:TriggermethodofafogfunctionbasedonContextualData(GeneratedbyIoTDevices.)

(44)

According to the documentation [101], Prometheus is:

”a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.”

It is useful to scrape different endpoints (Fog Nodes). It also provides an HTTP API for accessing data.

2.8 Conclusion

In this chapter, we briefly reviewed the state-of-the-art technologies and research that are related. We found that Docker container technology is the most suitable in terms of maturing, ease of access, community, and technological overhead.

Moreover, for data exchange, from different standards, NGSI was the most widely used which is backed by the European Union. FogFlow was chosen in the end since it fits the scope of this thesis, and it is open to contribution. Fundamental elements of FogFlow are also covered in this chapter so that the next chapters focus on the main contributions instead of laying ground.

(45)

In this chapter, we cover the theory and abstract rationalizing behind two main elements of this thesis: Monitoring, and Ranking. In a distributed system, there are thousands of time-series metrics data available, and determining the ones that affect Quality of Service, is a complicated task. We have not executed an exhaustive study of the effect of each metric. However, we would demonstrate the most general metrics that could be used in such a system. Furthermore, we explain how to perform decision-making using the Analytical Hierarchy Process (AHP) to interpret metric data and perform a decision.

3.1 Monitoring Edge Nodes

The first important part of our system is to design and develop a system that can be integrated into the FogFlow system for gathering data. In this part, we try to answer the question of what metrics have to be captured.

3.1.1 Availability of Metrics

Each system can expose some form of time-series data to be captured; however, different deployments generate different metrics. There might be hardware- dependent features that complicate decision-making. For example, as is explained later in this chapter, it is recommended to capture the error rate of a resource. Capturing the error rate of a CPU seems tricky or impossible at the software level, or it might require some extension to the system. The metrics we introduce in this part are for an ideal system. However, in practice, FogFlow system cannot provide all of the metrics at this moment. There would be more metrics available by changing a software controller: from Docker to Kubernetes.

(46)

3.1.2 Metrics in the system

The FogFlow system has no method for monitoring the components at the moment. The type and weight of metrics to gather would depend on the application deployed over the edge device.

For this system, the use-case is ranking of edge nodes for improving the Quality of Service and scalability, in scenarios that geographical distinction cannot provide a good QoS.

We found the problem requirements similar to the problem of deciding when to auto-scale pods in a Kubernetes cluster. We got inspired by the approaches taken in the Kubernetes clusters. A cluster of containers that are managed by Kuber- netes would generate hundreds of time-series data at different levels of network, os, application. Understanding the correct metrics that affect the availability and scalability of the system is of crucial importance.

Four Golden Signal

The famous Site Reliability Engineering Book[11] from Google, talks about different DevOps challenges Google has faced during the years for implementing cluster and DevOps challenges. Under the monitoring distributed system in that book, it is mentioned that out of all of the time-series metrics that a distributed system can produce, four of them, famous as ”Four Golden Signals,” contribute the most to the scalability of the system and reliability of the services. The signals are: [54]

• Latency the time it takes for a service to respond to a request. The latency is about the time of service and not time of failure, and this difference should be kept in mind because errors can be very fast or very slow. A simple connection problem would generate a swift HTTP 5XX failure code, or conversely, a timeout in internal service discovery would result in very long response times for errors that would mislead the calculations.

• ErrorThe rate of requests that fail for a service, implicitly or explicitly.

• Traffic The size of demand that is being put on a service by users. This metric is usually derived from high-level application-dependent data. For a web service, this is usually HTTP requests per seconds, and for an IoT