Building Auto-scaling, Resilient and Load Balancing Systems in a cloud native environment

Vikas Shivpuriya
5 min readAug 25, 2019

A well-designed cloud infrastructure is meant to scale with demand of your application and let you build and run your applications that are both scalable and resilient. In-order to keep that, your load balancing solution must be able to keep up. On the application front, a well-designed application must scale seamlessly with changes in demand and be resilient enough to withstand when one or more compute resources go down.

How to define scalability and resilience

A scalable application is one that works well with a single user as well as when the user count goes more than thousands or even millions, and handles the changes to demand gracefully. By adding or removing compute resources when needed, scalable applications consume the optimized resources needed to meet the demand.

Resilience on the other hand is the ability to withstand the unexpected. A resilient application is one that continues to function well in-spite of unexpected or sometimes expected failures of system components. Well planned application architecture is a key attribute to a truly resilient application.

At a high level, designing a system architecture for a scalable and resilient application involves:

A. Load balancers to distribute traffic to and monitor healthy servers that can best handle the requests

B. Configure a robust storage solution

C. Host resources in multiple regions

Auto scaling

The purpose of auto scaling is to dynamically increase or decrease the number of virtual

machines based on incoming demand. Auto scaling is normally configured by the scaling policy and the policy can defined to be triggered by multiple events, for examples in an AWS environment, CloudWatch metric alarms, a schedule, or anything that can make an API call. Policies allow the applications to scale their capacity based on real-time demand or on a planned usage such as an expected enrollment event for an insurance company. Internal to the infrastructure design, as capacity is increased or decreased, the nodes being added or removed must be registered or deregistered with a load balancing solution. For the process to work well, it is required that the process is automated as well.

As auto scaling may be as a result of utilization metrics or expected schedules, machines or containers being added will do nothing to serve more load, unless the load balancer that is sending them requests is notified. This is important in the cases when nodes are removed due to an auto scaling event, if the nodes are not deregistered, then the load balancer will continue to serve incoming requests to them causing a bad system experience. The basic premise of auto scaling that the application’s resources can increase or decrease requires that it has:

- Process and events by which you can add or remove instances from service. As part of config, you also need a way of deciding when an instance needs to be added, and when one should be removed.

- Process to store any stateful data. Instances can be dynamic, and it is not recommended to store any stateful data on those. It’s the scalable application architecture design that can solve this data problem by storing it in a separate storage instance.

Resiliency

For the solutions architecture to be resilient, it needs to automatically replace the nodes or instances that have become unavailable or have failed. When a new instance is launched, it should be fed the information to understand its role in the system and configure itself automatically, discover any of the systems dependencies and then start handling requests automatically.

Load Balancing Algorithms

Load balancing improves the overall system performance by shifting the workloads among different nodes. Through effective Load balancing, every virtual machine in the cloud system can process the same amount of work. Hence a good load balancing algorithm is needed to maximize the throughput by minimizing the response time. An efficient load balancing will ensure the uniform distribution of load on nodes and improves the overall performance of the system. System stability and faster response time are also ensured.

Load balancing concept is classified into two class i.e. static load balancing algorithm and dynamic load balancing algorithm. Static algorithm checks the current state of the node and distributes the requests on a fixed set of rules depending on the input requests. Second type is the dynamic Load Balancing algorithm which is also known as self-adaptive algorithm. This algorithm checks the previous state and the current node and adjusts traffic distribution

evenly in real time.

a. Static algorithms: Recommended for homogeneous and stable environments, algorithms such as — Round Robin Load Balancing Algorithm, Load Balancing Min-Min Algorithm and Load Balancing Min-Max Algorithm etc.

b. Dynamic algorithms: Recommended for heterogeneous environments. Algorithms such as Honeybee Foraging Behavior Load Balancing Algorithm, Throttled Load Balancing Algorithm etc.

Here is a short description of each algorithm type:

Round Robin algorithm — In this algorithm, fixed time is given to the job and the algorithm allocates jobs to all the nodes in a circular fashion. Compute resources are assigned in a circular order and hence there is no discrepancy in the workload assigned.

Min-Min algorithm — Load balancer maintains a list of tasks and minimum completion time is calculated for all the available nodes. A task with minimum completion time is assigned to the node that can complete the job in an optimum time.

MIN-MAX algorithm — Load balancer maintains a list of tasks and minimum completion time is calculated for all the available nodes. A task with maximum completion time is assigned to the node that can complete the job in an optimum time.

Honeybee Foraging Behavior algorithm — This algorithm follows the food foraging behavior of honeybees. algorithm works in preemptive manner and considers tasks priority while migrating them from one node to another. The algorithm considers multi objective optimization for selecting the optimal node for load balancing and for assigning priorities to the task.

Throttled load balancing Algorithm — In this algorithm, the load balancer maintains an index table of available nodes and their availability states. For the incoming requests, the load balancer scans the index table to find the first available node and allocate that node to serve the incoming request.

While selecting the right algorithm is important, It is also important to understand how adding or removing a node from the pool will redistribute the work load across nodes. Round robin algorithm aims to distribute load evenly based on a given metric and adding or removing nodes when using round robin algorithm will have little to no impact on the workload distribution.

Algorithms that distribute load by using a hash table, it’s very much possible that similar requests will be directed to the same node which may be useful to solve certain business use cases such as keeping a persist session for the end user, but this approach is not ideal for achieving auto scaling.

--

--