Friday, November 15, 2019

Mesos and Kubernetes : A Comparative Analysis

Mesos and Kubernetes : A Comparative Analysis Abstract Containers and application containerization have fast gained traction as the most promising aspects of Cloud Computing. A massive increase in the number and variety of applications has created a need for smooth integration between developer and live environments with quick service time. The amount of user data being handled by todays applications requires heavy computing resources which further require large clusters of hosts. Management of these large clusters is very challenging and containers provide a viable solution. Containers provide an operating system level virtualization for deploying and running applications in a distributed node topology, eliminating the need for configuration of a complete VM per application. Open source technologies like Docker have developed a method that provides better portability for containers. This paper presents a proposal for performance evaluation of two of the most widely used open source orchestration systems Kubernetes and Mesos for cloud native applications. We also provide a brief overview of the importance of choosing the right container orchestration tool to deploy and manage cloud-native applications. Keywords: Kubernetes, Mesos, Cloud native applications, Locust, GCE. With the fast spread of internet hegemony, the conventional and niche web applications are increasing in number. Deployment and maintenance of each of such applications requires a myriad of hardware and associated software designed to perform several generic activities. Rapid progress of cloud computing technologies have aided in decentralizing the implementations, leading to distributed systems. Docker technology provides containers for easy deployment and management of applications. Carefully managed by cluster management tools like Kubernetes and Mesos, replication, failovers, as well as APIs can automate integration and lead to a seamless deployment over clusters of host machines, thereby eliminating disruption of service caused by inherent downtime. Kubernetes: Kubernetes is an open source cluster manager project that integrates cluster management capabilities into a system of virtual machines. It is a lightweight, portable, modular, responsive and fault-tolerant orchestration tool that is written in Go and comes with built-in service discovery and replication utilities. Fig 1.1 shows the architecture and important concepts of Kubernetes. Fig 1 Kubernetes Architecture Important components of kubernetes are : Pods : Pod is the building block for schedul- ing, deployment, horizontal scaling and replication. It is a group of tightly coupled containers that are located on the same host and sharing the same IP address, ports, resources and the same localhost.[7] Kubelet is the agent that runs on the worker nodes that manages Pods, their containers, container images and the volumes if any. Replication Controllers : They control and monitor the number of running pods for a service, and provide fault tolerance.It is the high availability solution of Kubernetes. Kubectl : The command to control the kubernetes cluster once its running. Kubectl runs on the master node. Kubernetes has a policy driven scheduler (Kube-scheduler) which considers availability, performance and capacity constraints,quality of service requirements, and workload. kubernetes can also work with multiple schedulers. Users can add their own schedulers if other constraints are required.[7] Mesos: Apache Mesos is an open-source cluster manager, developed by Benjamin Hindman, Andy Konwinski, and Matei Zaharia at the University of California, Berkeley as a research project along with professor Ion Stoica. Its designed to scale to very large clusters involving hundreds or thousands of hosts such as hadoop tasks, cloud native applications, etc. It enables resource sharing in a fine-grained manner thus improving cluster utilization.To deploy and manage applications in large-scale clustered environments more efficiently, Mesos plays role between the application layer and the operating system and makes it easier. It can run many applications on a dynamically shared pool of nodes. The major components in a Mesos cluster are: Fig 2 Mesos Architecture [6] Mesos follows 2 level scheduling. Each framework asks Mesos for a certain amount of resources it requires, in response Mesos offers a set of resources. Framework scheduler evaluates the offered resources based on its own criteria and accepts or refuses.[7] Apache ZooKeeper acts as a central coordination service to achieve high availability. The design comprises multiple masters, where one is an active leader and ZooKeeper handles the leader election. For high availability setting, a minimum of 3 master nodes is needed. Marathon is a framework that is designed to launch long-running applications, and serves as a replacement for a traditional init system. It provides many features such as high availability, application health checks, node constraints, fault -tolerance and an easy to use web UI for long running application. Marathon Framework is composed of executor and scheduler. The UI of marathon provides an option to start, stop and scale the long running applications. Kubernetes and Mesos makes the process of setting up multiple virtual clusters simpler, allowing for stack management to shed unwanted layers of software which bog down systems. Using Kubernetes and Mesos for cluster management allows for high-level task monitoring, resource allocation and application scaling, whilst offering the control needed to ensure applications run smoothly. Setting up of either Mesos or Kubernetes on Windows means developers and organizations that work between Linux and Windows platforms may use their own tools without requiring heavy resource management. A. Container Orchestration tools and its importance: With the usage of containers, running cloud-native applications on physical or virtual infrastructure is made easy. Containers facilitate easier application management to dynamically adapt to the changing needs of service. It also enables seamless migration of application instances to different environments. Multiple containers need effective management utilities that manage the resources and enable running of containers on different environments, over multiple hosts. Orchestration tools manage applications of different complexities that are distributed for computing over cluster of machines. These tools abstract the cluster systems as a single entity for deployment of application and managing the resources. Orchestrations tools can handle configuration, scheduling and deploying of applications, along with maintenance and support for automatic failovers and scaling. Kubernetes acts primarily as a container orchestration tool whereas Mesos provides a platform to run orchestration frameworks like Marathon or Aurora to manage applications, which may or may not be containerized. Comparing the stand alone Kubernetes orchestration and Marathon with Mesos is effective in understanding the right choice for implementation. B. Proposed solution on Google Compute Engine: There are no synthetic benchmarks that exist to evaluate the performance of Kubernetes and Mesos. This paper aims at evaluating orchestration methods on Google Compute Engine (GCE) for hosted cluster installation and management. A single cluster in GCE for all purposes will have a master VM and four worker VMs. Setting a baseline comparison through a simple cloud application deployment . This is the first proposed benchmark which analyses user experience with minimal containers on Kubernetes cluster and Mesos . Having a Google Cloud Platform account and installing Google Cloud SDK is the first step for this. Cloud application is then deployed on the created cluster to compare their respective processes of deployment. Streaming Engine using Docker clusters on GCE to check the delivery speed , scheduling , and scalability of container orchestrations. This is also to test the feature of pods on Kubernetes where all containers in a pod have single networking point. Standalone analysis using existing tools to test performance and known limitations of both these systems. cAdvisor that collects data about running containers, Heapster which gives the basic resource utilization metrics on Kubernetes and marathon-lb tools on Mesos marathon. This paper aims to provide qualitative as well quantitative metrics to compare and contrast the working of Kubernetes and Mesos. The objective is to compile a substantive list of criteria analysing the performance of both the orchestration tools. The study intends to bring to light comparative results that hitherto do not exist in related literature and also to build upon the existing knowledge through the results of the experiments in this paper. Some of the comparative points are:Load balancing, Scalability, User experience . Kubernetes Mesos Distinctive features Offers a combination of pods which are controlled by replications controllers . IPC between pods systemv semaphores or posix shared memory . Do not support colocation of multiple containers on same mesos. Application distribution Supports master-worker nodes , where the applications are deployed on pods on worker nodes. Supports master-agent nodes , and applications are deployed on different agent nodes. Resource schedulers Has a policy driven scheduler (Kube-Scheduler) Has a 2 levels scheduling approach. Scalability Kubernetes 1.3 supports 2000 node clusters Mesos has been simulated to scale up to 50,000 nodes [9] Load Balancing Supports both internal and external load balancing.. Mesos DNS (rudimentary load balancer), Marathon-lb (haproxy based load balancer for Mesos marathon) Monitoring tools Heapster, cAdvisor and Google Cloud Monitoring .InfluxDB and Grafana as backend tools for visualization. Sysdig and Sysdig Cloud (full metrics and metadata support for Apache Mesos and Mesosphere Marathon framework) The implementation was done on Google cloud platform, using the Google Compute Engine (GCE). Under the scope of the account setup for implementation, following are the details of the resources available. For this implementation, two of the available 4 machines have been used. Resource Machine Names n1-standard-1 n1-standard-2 Virtual CPU 1 2 Memory (GB) 3.75 7.50 Max No of Persistent Disks (PD) 16 16 Max PD Size (TB) 64 64 A. Kubernetes ecosystem Kubernetes ecosystem is spread over two setups as shown below. Two Node Setup Four Node setup Master node VMs 1 1 CPU 1 1 Machine type N1-standard-1 N1-standard-1 Worker nodes VMs 1 3 CPU 2 each 2 each Machine type N1-standard-1 N1-standard-1 Table 3 Kubernetes ecosystem Production grade kubernetes is available open source and can be installed from its official page [10]. After the installation of kubernetes , start up script kube-up.sh can be used to spin up a cluster. A cluster consists of a single master instance and a set of worker nodes each of which is a Computer engine virtual machine.This process takes about ten minutes to bring up a cluster and once the cluster is running , IP addresses of all the nodes can be obtained from the computer engine. Cluster specifications can be specified using environment variables like NUM_NODES , MASTER_SIZE, NODE_SIZE or can also be specified in config_default.sh. kubectl is the command line interface for kubernetes clusters. It supports command types like create, apply, attach , config, get, describe, and delete and resource types like pods, deployment, and services. B. Mesos ecosystem Different approaches were used to implement a Mesos cluster system as per the available resources. The procedure followed for each implementation and the associated complexities are described briefly. The third implementation method, which was incorporated into this project, is described in detail. Single master Single Slave In the first method that was tried for setup, the system was formulated as a single node cluster consisting of zookeeper, marathon, a single master and a single agent processes. The images for these were pulled from the Docker hub, using Docker installed on the GCE shell. Four containers, one each for the process listed were started. The Mesos master UI was accessible through the browser on its designated IP address, at port 5050. Marathon UI was accessed through its external IP address at port 8080. This implementation posed two constraints for successful implementation. The set up used up all the available CPU and a multinode configuration could not be implemented. Further, a public Docker image poses trust issues for a system implementation. It was, therefore, decided to explore other options. Datacenter / Operating System (DC/OS) DCOS is a product of a company called Mesosphere which makes applications and solutions based on Apache Mesos. DCOS is designed as a distributed operating system with Apache Mesos serving as its kernel. The intent is to abstract the different functionalities of multiple machines so as to club them as a single computing resource. DCOS can offer container orchestration as it has Marathon scheduler built into its design at the backend. [11] Installation of DCOS on the Google Compute Engine requires the setting up of a primary bootstrap node on which the GCE scripts shall be run to create the cluster nodes. A yaml format installation file is to be run via Ansible playbook to create and configure the cluster nodes with DCOS running on them. Several environment variables have to be customized such as setting up RSA public/private key pairs that shall allow for a SSH based login into the cluster nodes. The team was unsuccessful in setting up a DCOS running cluster on GCE. The support community for DCOS is not very mature and the installation issues faced by the team could not be resolved. Exploring the services of DCOS has been included as one the future work possibilities in this paper as DCOS promises great potential in terms of effective container orchestration. Installing VMs on GCE In this method, Mesos ecosystem implementation is over 6 virtual machines, using four n1-standard-1 and two n1-standard-2 machine types. The system consists of 3 master nodes and 3 agents, with the Marathon and Zookeeper processes running on VMs 1, 2 and 3, as shown in the figure below. The VMs with two CPUs indicates n1-standard-2 machines. Fig 3 Mesos Implementation Diagram The following processes are run on each of these VMs to establish a self sufficient ecosystem. Marathon Marathon runs as a scheduling framework on Mesos and is deployed over VM1. Zookeeper Zookeeper is a process that manages which master process to run as active and which to keep as standby. Zookeeper processes are run on VM1, VM2 and VM3, to keep a backup zookeeper process running to facilitate automatic failover of a master process. Mesos Master Three mesos master processes are run, each in VM1, VM2 and VM3. The quorum associated with Zookeeper selects one of these three masters to be active and the rest to be standby. Mesos Agents Mesos Agents processes run on VM4, VM5 and VM6. Mesos agent on VM6 runs on an n1-standard-1 machine, as compared to agents on VMs 4 and 5. The Kubernetes and Mesos Cluster systems were set up as described in the implementation section. Each ecosystem was evaluated in different scenarios and the behaviour of the systems were analysed for each of the scenarios in terms of scalability, load balancing and failover capabilities. Kubernetes System: Creating and deploying the application on kubernetes is primarily carried out by the specifications on pod.yaml , deployment.yaml , and service.yaml files. pod.yaml deployment.yaml service.yaml Operations Group of containers tied together for networking Used to schedule the creation of pods and check their health. To expose the created deployment to the outside of clusters. arguments specified -docker image -shared volumes -CPU restrictions on single pod -LivenessProbe -ReadinessProbe -replicas : to ensure the minimum number of pods that needs to be running at all times. -loadbalancer -clusterIP Table 4 Kubernetes :application deployment components Kubernetes Scalability : Setup used for understanding scalability in kubernetes is described in the kubernetes ecosystem section. This process is aimed at gauging kubernetes scalability against the CPU resource utilization of clusters, auto scaling of pods , and API responsiveness. Web based WordPress application was chosen for this purpose. Scaling in kubernetes is achieved by horizontal auto scaling of pods .It dynamically adjusts the number of pods in deployment to meet the load/traffic. Horizontal Pod Autoscaler(HPA) can be created via the kubectl command kubectl autoscale deployment wordpress cpu-percent=14 min=1 -max=10 . This means that the horizontal autoscaler will increase and decrease the number of pods to maintain an average CPU utilization of 14% across all Pods. It also facilitates automatic failover of pods. Locust was used for creating load on WordPress application. Locust is an easy-to-use python based load testing tool which is used to find out how many concurrent users a system can handle . It swarms the web applications with a number of users which is specified by using the web UI. Once the application was hosted by kubernetes , load was initiated to its load balancer ingress IP using locust .The intention was to learn how the auto scalers react on the load as generated by locust. The results of the experiment can be better explained using the tabular format as below. The parameters like minimum and maximum number of pods , target CPU utilization were kept similar to both the setups. Number of requests in the table suggests the total number of users created by locust. Two node setup (Total 7 CPUs) Number of Pods 10 50 150 Target CPU 14 14 8 Max Number of requests 575 966 3158 Failure % 23% 23% 41% Table 5 Kubernetes: Scalability in two node system Four Node setup (Total 3 CPUs) Number of Pods 10 50 150 Target CPU 14 14 8 Max Number of requests 611 1433 7513 Failure % 24% 20% 2% Table 6 Kubernetes: Scalability in four node system Observations from the above tabulated results : The number of pods from 1-10 did not have any significant impact on the failure percentage . The significant difference in the results were spotted as the number of pods were increased. As number of requests increased , the increase in the number of pods was witnessed. And with the load going down pods were downsized automatically. Fig 4 Kubernetes pods in running and terminating states The failure percentage was drastically reduced between the two setups with high load and higher number of pods . The failure percentage is almost similar between the two setups with less load. Setup was benchmarked at 150 for maximum number of pods. It was observed that going beyond this value left many pods in pending state for longer than seven minutes. Starting a pod takes lesser than four seconds in other cases . More number of pods will be created when the target CPU percentage specified in the horizontal auto scaler command is less. CPU resource utilization of four node cluster is as shown below .This shows that the newly created pods were allocated equally across the worker nodes.The below graph is as seen from stackdriver utility. Fig 5 CPU usage of a Kubernetes cluster Fig 6 Load distribution over the worker nodes. Mesos System: Application scalability, in terms of Mesos using Marathon is represented as number of instances that are created and successfully run on the active agent nodes. Marathon provides an option to simulate application instances to be distributed over the agent nodes through the Scale option in the User Interface Dashboard. Applications are specified as JSON files, either through the Create Application option of the Marathon UI or through a JSON file in Git which is imported, built and deployed over Marathon for distribution and scheduling, through the use of continuous integration tool called Jenkins. Deploying an application: Mesos using Marathon forms an orchestration tool for managing application instances on the different active agent nodes. These nodes are managed by a master instance, which is effectively managed by the Zookeeper processes. Distribution of application instances on agent nodes depends on the resources allocated to each of the agents. For this implementation, we consider an application that is not CPU intensive. This application abstracts any data intensive application, that is based on a request response model. Following table summarizes the different scenarios simulated to test scalability of the application, each with the different configurations employed. Cluster configuration represents number of active agent nodes, as number of masters remain at 3. Sl No Cluster Configuration Effective CPUs available CPU Usage per instance (%) Memory usage per application instance (MB) Maximum instances scaled for the CPU available 1 2 Agents 4 10 32 40 2 3 Agents 5 10 32 50 3 2 Agents 4 2 10 200 4 3 Agents 5 2 10 250 Table 7: Scalability analysis with a Data Intensive application The tabulated results indicate the effective operation of Mesos cluster with Marathon scheduling framework, which suggest the easy scalable property of a Mesos cluster system. When there are more number of instances of application that need servicing, a mesos cluster starts new agent process and effectively distributes the application load over the running agents. Load Balancing Increase in number of application instances require more number of agent nodes running to service all the requests. However, the request handling is not efficient, if all of the requests are directed to a single agent. The workload is distributed effectively among all the agent nodes. For the scaling test scenarios described in the previous section, CPU usage was monitored using Google Stack Driver utility. The graph below shows CPU usage at different timelines. The rapid rise or fall of the usage attributes to the increasing/decreasing number of application instances that need servicing. Fig 7 CPU usage of a Mesos cluster with changes in application instances The distribution of workload on all the processes is tabulated, using the Stack Driver utility, as illustrated by the figure below. Fig 8 Load distribution over the six processes There is a significant workload over master node 3, as the marathon process utilizes the core of VM3, even though the process is run on VM1. Master nodes have least CPU usage, owing to the fact that the only operation performed by the nodes is distribution of application tasks over the agents. The agents are represented as three processes named mesos-slave-1, mesos-slave-2 and mesos-slave-3. The workload distributed on these appear even. However, the agent-3 runs only on a single core and it uses 22.9% of the total allocated core. This summarizes the effective load balancing that a Mesos system incorporates. Failover Mesos Cluster system runs additional master processes as standby to facilitate automatic failover of the system. In this experiment, as an initial condition, the quorum of Zookeepers elected Mesos-master-2 to be primary and Mesos-Master-1 and Mesos-Master-3 as secondary. Application deployment was initiated as per the previous procedure, using a JSON file through Marathon. The active tasks on Mesos-master-2 were checked at port 5050 of the master-2 external IP address to check the delegation of tasks to the active agent nodes. To test failover, the mesos-master-2 process was killed. It was observed that the presence of Zookeeper effectively switched the application deployments over the agents through mesos-master-1. The delegation of tasks to slave was now observed through the browser on the external IP address of mesos-master-1 at port 5050. With this project, the implementation and experimentation enabled a better understanding of the concepts related to orchestration, containerization, scalability and load balancing properties of a cluster based environment. This will ease the initial understanding of deployment and management of cloud native applications, and to better setup and environment that houses them. With the help of this documentation, along with the link provided through github, it would be easier to setup an orchestration environment, as the team has tried to collate the steps involved in implementing a cluster with orchestration tools. Through research and experimentation, the team was able to put together enough literature to understand, compare, contrast and conclude on various aspects of orchestration systems and understand the major difference between Kubernetes and Mesos based systems. Lack of resources for implementation of Mesos based systems, and equivocal distinction among the several example implementation required for a better compilation of materials, which was achieved through this project work. In a survey conducted by P Heidari et al [7] on some of the well known orchestration tools with a primary focus on QoS capabilities, the authors have concluded that not all of the solution tools provide a guaranteed healthy running replicas to effectively maintain the quality of service. They have cited that tools like Marathon and Fleet tend to go into a state of unprecedented wait due to the need for appropriate resources. There is a need of an elasticity e

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.