CONCLUSIONS - Automatically Scaling a System Across Multiple Servers : A Comparison of Docker S

The choice of container orchestrator between Docker Swarm and Kubernetes is a matter of prioritisation between simplicity and functionality. Docker Swarm offers the basic set of features expected from a container orchestration platform. Its simple architecture adds little to no performance overhead on top of Docker and makes it easy to use and main-tain. While Kubernetes falls behind in simplicity and performance, it makes up for it with a comprehensive set of features for deploying complex and robust enterprise applica-tions. Docker Swarm is thus ideal for environments requiring simplicity, ease of use and fast deployment. For more complex applications requiring features that Docker Swarm cannot offer, Kubernetes is the reasonable choice.

One of the most important features still missing in Docker Swarm is automatic scaling, which is needed for the elasticity expected from cloud services. The lack of this feature can easily lead to favouring Kubernetes over Docker Swarm. As this thesis shows, au-tomatic scaling can be achieved in Docker Swarm with existing open-source components and its functionality extended with a relatively simple custom program. Therefore, the necessity for automatic scaling alone should not lead to choosing Kubernetes over Docker Swarm. Instead, a more thorough comparison should be conducted. The com-parison results presented in this thesis can be used exactly for that.

The microservice architecture proposed in this thesis works as an excellent starting point for distributing the target system across multiple servers. Preliminary tests show that the approach works well for the simulator used as a case example. Further development of the system includes productising the proof-of-concept implementations constructed in the scope of this thesis and introducing the approach to other simulators. Considering the comparison conducted in this thesis, Kubernetes should be chosen to orchestrate the target system. It offers a broader set of features which supports the productization of the system in the future, and its shortcomings in complexity are overcome with the or-ganisation’s competence in working with Kubernetes. If managing the cluster creates too much overhead, Kubernetes could be acquired as a service. However, with its simplicity, Docker Swarm offers a great alternative for prototyping and testing new approaches quickly and with a low amount of management overhead.

This thesis was able to answer its research questions and indicate the direction for future development of the target system. The investigation of how different autoscaling solu-tions for Docker Swarm perform in a production environment, and how they compare to

the Horizontal Pod Autoscaler of Kubernetes, did not fit in the scope of this thesis but would be an interesting topic for future research. Future research could also extend the existing performance comparisons of Docker Swarm and Kubernetes. This thesis was able to provide an updated feature comparison of the orchestrators, but the most recent performance comparisons are still from 2019. In these comparisons, Docker was used as the container runtime in Kubernetes, but its support has since been deprecated. It would be interesting to see how using containerd directly, or even a different container runtime altogether, affects the performance of Kubernetes.

It should also be noted that Docker Swarm and Kubernetes are still under active devel-opment and new features are introduced frequently. The comparison conducted in this thesis is likely to become outdated as time passes, which is why similar studies should be conducted regularly to keep up-to-date information available to potential users of the orchestrators.

REFERENCES

[1] Flexera, RightScale 2019 State of the Cloud Report From Flexera, Flexera, 2019.

[2] P. Mell and T. Grance, The NIST Definition of Cloud Computing, Gaithersburg:

Computer Security Division, Information Technology Laboratory, National Institute of Standards and Technology, 2011.

[3] S. Newman, What Are Microservices?, 2nd ed., Sebastopol, CA: O'Reilly Media, Inc., 2019.

[4] K. Indrasiri and P. Siriwardena, Microservices for the Enterprise: Designing, Developing, and Deploying, 1st ed., Berkeley, CA: Apress, 2018.

[5] Docker Inc., Docker overview, Docker Documentation, Available:

https://docs.docker.com/get-started/overview/. [Accessed 1 March 2021].

[6] J. Fong, Are Containers Replacing Virtual Machines?, Docker Blog, 30 August 2018.

Available: https://www.docker.com/blog/containers-replacing-virtual-machines/.

[Accessed 8 April 2021].

[7] Docker Inc., Docker Engine overview, Docker Documentation, Available:

https://docs.docker.com/engine/. [Accessed 1 March 2021].

[8] Docker Inc., Use volumes, Docker Documentation, Available:

https://docs.docker.com/storage/volumes/. [Accessed 9 February 2021].

[9] Docker Inc., Overview of Docker Compose, Docker Documentation, Available:

https://docs.docker.com/compose/. [Accessed 4 March 2021].

[10] Docker Inc., Orchestration, Docker Documentation, Available:

https://docs.docker.com/get-started/orchestration/. [Accessed 9 February 2021].

[11] IBM, What is Containers as a service (CaaS)?, IBM, Available:

https://www.ibm.com/services/cloud/containers-as-a-service. [Accessed 22 March 2021].

[12] E. Truyen, D. Van Landuyt, D. Preuveneers, B. Lagaisse and W. Joosen, A Comprehensive Feature Comparison Study of Open-Source Container Orchestration Frameworks, Applied Sciences, vol. 9, no. 5, 2019.

[13] J. Cormack, Classic Swarm: a Docker-native clustering system, GitHub, 2020.

Available: https://github.com/docker/classicswarm. [Accessed 26 February 2021].

[14] Docker Inc., Docker 1.12: Now with Built-in Orchestration!, Docker Blog, 20 June

[18] J. Castro, D. Cooley, K. Cosgrove, J. Garrison, N. Kantrowitz, B. Killen, R. Lejano, D.

Papandrea, J. Sica and D. Srinivas, Don't Panic: Kubernetes and Docker, Kubernetes, 2 December 2020. Available: https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/. [Accessed 28 February 2021].

[19] The Kubernetes Authors, Kubernetes Components, Kubernetes, 2021 March 2021.

Available: https://kubernetes.io/docs/concepts/overview/components/. [Accessed 19 April 2021].

[20] The Kubernetes Authors, The Kubernetes API, Kubernetes, 8 December 2020.

Available: https://kubernetes.io/docs/concepts/overview/kubernetes-api/. [Accessed 28 Februrary 2021].

[21] The Kubernetes Authors, Pods, Kubernetes, 12 January 2021. Available:

https://kubernetes.io/docs/concepts/workloads/pods/. [Accessed 28 February 2021].

[22] The Kubernetes Authors, Understanding Kubernetes Objects, Kubernetes, 13 October 2020. Available: https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/. [Accessed 22 April 2021].

[23] Helm Authors, Helm, Helm, Available: https://helm.sh/. [Accessed 8 April 2021].

[24] I. M. A. Jawarneh, P. Bellavista, F. Bosi, L. Foschini, G. Martuscelli, R. Montanari and A. Palopoli, Container Orchestration Engines: A Thorough Functional and Performance Comparison, ICC 2019 - 2019 IEEE International Conference on Communications (ICC), pp. 1-6, 2019.

[25] Y. Pan, I. Chen, F. Brasileiro, G. Jayaputera and R. Sinnott, A Performance Comparison of Cloud-Based Container Orchestration Tools, 2019 IEEE International Conference on Big Knowledge (ICBK), pp. 191-198, 2019.

[26] Docker Inc., Docker Engine release notes, Docker Documentation, Available:

https://docs.docker.com/engine/release-notes/. [Accessed 31 March 2021].

[27] The Kubernetes Authors, v1.20 Release Notes, Kubernetes, 12 March 2021.

Available: https://kubernetes.io/docs/setup/release/notes/. [Accessed 31 March 2021].

[28] Docker, Inc., Deploy services to a swarm, Docker Documentation, Available:

https://docs.docker.com/engine/swarm/services/. [Accessed 27 February 2021].

[29] The Kubernetes Authors, Volumes, Kubernetes, 31 March 2021. Available:

https://kubernetes.io/docs/concepts/storage/volumes/. [Accessed 28 February 2021].

[30] Docker, Inc., User overlay networks, Docker Documentation, Available:

https://docs.docker.com/network/overlay/. [Accessed 28 February 2021].

[31] Docker, Inc., User swarm mode routing mesh, Docker Documentation, Available:

https://docs.docker.com/engine/swarm/ingress/. [Accessed 28 February 2021].

[32] The Kubernetes Authors, Connecting Applications with Services, Kubernetes, 5 August 2020. Available: https://kubernetes.io/docs/concepts/services-networking/connect-applications-service/. [Accessed 28 February 2021].

[33] The Kubernetes Authors, Workloads, Kubernetes, 15 October 2020. Available:

https://kubernetes.io/docs/concepts/workloads/. [Accessed 28 February 2021].

[34] The Kubernetes Authors, Assigning Pods to Nodes, Kubernetes, 19 March 2021.

Available: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/.

[Accessed 15 April 2021].

[35] The Kubernetes Authors, Configure Liveness, Readiness and Startup Probes, 30 December 2020. Available: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/. [Accessed 28 Febryary 2021].

[36] V. Farcic, Auto-Scaling Docker Swarm Services Using Instrumented Metrics, Dockerflow, Available: https://monitor.dockerflow.com/auto-scaling/. [Accessed 21 March 2021].

[37] A. Kulkarni, Orbiter Demo, GitHub, 2 April 2018. Available:

https://github.com/askulkarni2/swarm-autoscaling-demo/blob/master/README.md.

[Accessed 21 March 2021].

[38] D. N. Rao, Docker swarm service autoscaler, GitHub, 16 January 2018. Available:

https://github.com/sahajsoft/docker-swarm-service-autoscaler. [Accessed 21 March 2021].

[39] The Kubernetes Authors, Horizontal Pod Autoscaler, Kubernetes, 10 Februrary 2021.

Available: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

[Accessed 28 February 2021].

[40] The Kubernetes Authors, Labels and Selectors, Kubernetes, 24 February 2021.

Available: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/.

[Accessed 28 February 2021].

[41] The Kubernetes Authors, Ingress, Kubernetes, 21 January 2021. Available:

[49] Microsoft, Azuer Kubernetes Service (AKS), Available:

https://azure.microsoft.com/en-us/services/kubernetes-service/. [Accessed 15 April 2021].

[50] The Kubernetes Authors, Creating a cluster with kubeadm, Kubernetes, 29 March 2021. Available: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/. [Accessed 15 April 2021].

[51] G. Shapira, T. Palino, R. Sivaram and N. Narkhede, Kafka: The Definitive Guide, 2nd ed., Sebastopol: O'Reilly Media, Inc., 2021.

[52] Confluent, Inc., Confluent Platform, Confluent, Available:

https://www.confluent.io/product/confluent-platform/. [Accessed 8 April 2021].

[53] The Kubernetes Authors, Kubernetes + Compose = Kompose, Kompose, Available:

https://kompose.io/. [Accessed 8 April 2021].

[54] Prometheus Authors, Prometheus, Prometheus, Available: https://prometheus.io/.

[Accessed 7 March 2021].

[55] Jenkins, Jenkins, Jenkins, Available: https://www.jenkins.io/. [Accessed 21 March 2021].

[56] G. Arbezzano, Orbiter, GitHub, 30 September 2017. Available:

https://github.com/gianarb/orbiter. [Accessed 21 March 2021].

[57] Google, cAdvisor, GitHub, 30 October 2020. Available:

https://github.com/google/cadvisor. [Accessed 7 March 2021].

[58] Prometheus Authors, Monitoring Docker container metrics using cAdvisor, Prometheus, Available: https://prometheus.io/docs/guides/cadvisor/. [Accessed 7 March 2021].

[59] Prometheus Authors, Querying Prometheus, Prometheus, Available:

https://prometheus.io/docs/prometheus/latest/querying/basics/. [Accessed 7 March 2021].

[60] The Kubernetes Authors, Horizontal Pod Autoscaler Walktrough, Kubernetes, 11 February 2021. Available: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/. [Accessed 26 March 2021].

[61] Kubernetes SIGs, Kubernetes Metrics Server, GitHub, 20 April 2021. Available:

https://github.com/kubernetes-sigs/metrics-server. [Accessed 22 April 2021].

APPENDIX A: ALERT-BASED AUTOSCALING IN DOCKER SWARM

The alert-based autoscaling solution for Docker Swarm consists of 4 open-source com-ponents: cAdvisor, Prometheus, Alert Manager and Orbiter. This appendix describes the process of configuring and deploying the components to achieve automatic scaling.

Prometheus should be configured to scrape metrics from cAdvisor, as illustrated in Pro-gram 3. The same configuration also establishes a connection between Prometheus and the Alert Manager, which is used to monitor the metrics and trigger scaling.

Program 3. Configuring Prometheus to scrape metrics from cAdvisor and directing alerts to the Alert Manager service.

Prometheus alerts are defined in a separate file presented as Program 4. The metric for scaling the service is selected with the PromQL query METRIC_QUERY in Program 4.

In the case of the average CPU utilisation level, the following query is used:

avg (sum (rate

(container_cpu_usage_seconds_total{container_label_com_docker_swarm_servic e_name=~".*SERVICE_NAME$"}[1m])) by (name)),

where SERVICE_NAME is the name of the service. The query first retrieves the rate of total CPU usage for each container of the service². The value for each container is an

2 The name of the service is not directly compared to SERVICE_NAME, because Docker adds the name of the deployment at the beginning of each service name. Instead, a regular expression that matches service names that end with SERVICE_NAME is used.

prometheus.yaml rule_files:

- "alerts.yaml"

alerting:

alertmanagers:

- static_configs:

- targets: ['alertmanager:9093']

scrape_configs:

- job_name: cadvisor scrape_interval: 5s dns_sd_configs:

- names: ['tasks.cadvisor']

type: 'A' port: 8080

array that contains the CPU utilisation levels for each CPU core. To get the total CPU usage for each container, the query then calculates the sum of the values in the arrays by container name. Finally, the query calculates the average CPU utilisation across the containers. The SCALE_UP_THRESHOLD and SCALE_DOWN_THRESHOLD in Pro-gram 4 are used to define the desired range of the metric value.

Program 4. Configuring Prometheus to create alerts based on metrics and thresholds.

Alert Manager is configured with a separate file alertmanager.yaml, presented in Pro-gram 5. The configuration file defines two receivers for the alerts: one for scaling up and another one for scaling down. The receivers send an HTTP request to Orbiter with routes specific to the service that should be scaled. Orbiter registers these routes for all services that have the label orbiter set to true. It should be noted that in the service name in the route, the prefix nst should match the deployment name used when deploying the stack.

The solution can be deployed by using the docker-compose.yaml file presented in Pro-gram 6. In a production setup, separate networks should be configured for monitoring and NST communication, and named volumes should be defined for Prometheus and Alert Manager to store their data. These have been left out in Program 6 for brevity and clarity. All the configuration files presented in this appendix should reside in the same directory, in which the following command can be run to deploy the solution:

docker stack deploy -c docker-compose.yaml nst,

where nst is the name of the deployment that matches the deployment name in the Alert Manager configuration.

Program 5. Configuring Alert Manager to send scaling requests to Orbiter whenever scaling alerts are received.

alertmanager.yaml global:

resolve_timeout: 1m route:

receiver: 'default-receiver' group_by: [alertname]

group_interval: 2m routes:

- match:

scale: up

receiver: simulator_up - match:

scale: down

receiver: simulator_down receivers:

- name: 'default-receiver' - name: 'simulator_up' webhook_configs:

- url:

'http://orbiter:8000/v1/orbiter/handle/autoswarm/nst_simulator-a/up' - name: 'simulator_down'

webhook_configs:

- url:

'http://orbiter:8000/v1/orbiter/handle/autoswarm/nst_simulator-a/down'

Program 6. Deployment definition for deploying the alert-based autoscaling solution.

- /var/run/docker.sock:/var/run/docker.sock deploy:

command: --config.file=/etc/prometheus/prometheus.yaml volumes:

- ./prometheus.yaml:/etc/prometheus/prometheus.yaml

alertmanager:

image: prom/alertmanager

command: --config.file=/etc/alertmanager/alertmanager.yaml volumes:

- ./alertmanager.yaml:/etc/alertmanager/alertmanager.yaml

cadvisor:

image: cadvisor/cadvisor ports:

- /var/lib/docker/:/var/lib/docker:ro depends_on:

APPENDIX B: IMPLEMENTING AN AUTOSCALER FOR DOCKER SWARM

The custom autoscaling solution presented in this thesis combines the functionality of Alert Manager and Orbiter into a single component called Autoscaler. The implementa-tion of Autoscaler and the configuraimplementa-tion of the soluimplementa-tion are described in this appendix.

Autoscaler is a daemon process that runs Program 7 periodically to monitor Prometheus metrics and scale the services in a swarm. The scaling behaviour is controlled with the following service labels, all prefixed with autoscale: min, max, query and desired_metric.

The first two set the limits for the replicas deployed for the service, query is the PromQL query used to fetch the metric, and desired_metric is used to scale the service. Au-toscaler detects the services that have the labels in place, fetches the metrics for each of them from Prometheus and scales the services accordingly. Autoscaler can be imple-mented in programming languages that can utilise the PromQL API to retrieve the met-rics and the Docker API to scale the services. The approach was tested by implementing a proof-of-concept Autoscaler in Golang by connecting to Prometheus over HTTP and to Docker API with the Docker client package.

Program 7. Scaling Docker Swarm services automatically with metrics provided via Prometheus. Detect() is a function that connects to Docker API and returns the services that have the label “autoscale”, Metadata() returns the name and labels for a service, ParseLabels() returns the labels starting with “autoscale”, GetMetric() fetches the metric from Prometheus, GetReplicas() connects to Docker API and returns the number of rep-licas of a service, Clamp() clamps an integer between a minimum and a maximum value, ScaleService() connects to Docker API and scales the service according to the specified replicas.

PROGRAM Autoscale():

services <- Detect()

FOR service IN services DO:

service_name, labels <- Metadata(service)

min, max, desired_metric, query <- ParseLabels(labels) current_metric <- GetMetric(query)

current_replicas <- GetReplicas(service_name) desired_replicas <-

⌈current_replicas * current_metric / desired_metric⌉

target_replicas <- Clamp(desired_replicas, min, max) ScaleService(service_name, target_replicas)

ENDFOR

As with the solution utilising Alert Manager and Orbiter, Prometheus is configured to scrape metrics from cAdvisor, as can be seen in Program 8.

Program 8. The Prometheus configuration for the custom autoscaling solution.

The solution can be deployed by using the docker-compose.yaml file provided in Pro-gram 9. In a production setup, a separate network should be created for the autoscaling solution and a named volume defined for Prometheus to store the metrics. They are left out in Program 9 for brevity and clarity. The Autoscaler configuration can be seen in the Compose file, where simulator-a has been configured to scale automatically by using labels. The solution is deployed by running the following command in the directory con-taining the configuration files presented in this appendix:

docker stack deploy -c docker-compose.yaml nst, where nst is the name of the deployment.

prometheus.yaml scrape_configs:

- job_name: cadvisor scrape_interval: 5s dns_sd_configs:

- names: ['tasks.cadvisor']

type: 'A' port: 8080

Program 9. The deployment definition for deploying the custom autoscaling solution.

- --config.file=/etc/prometheus/prometheus.yaml volumes:

- ./prometheus.yaml:/etc/prometheus/prometheus.yaml

cadvisor:

image: cadvisor/cadvisor ports:

- /var/lib/docker/:/var/lib/docker:ro depends_on:

image: simulators/autoscaler volumes:

- /var/run/docker.sock:/var/run/docker.sock depends_on:

- autoscale.desired_metric: DESIRED_METRIC_VALUE

In document Automatically Scaling a System Across Multiple Servers : A Comparison of Docker Swarm and Kubernetes (sivua 40-51)