Scalability and elasticity: what you need to take your business to the cloud

We’re excited to bring Transform 2022 back in person on July 19 and pretty much July 20-28. Join AI and data leaders for insightful conversations and exciting networking opportunities. Register today!


by 2025, 85% of companies will have a cloud-first principle – a more efficient way to host data rather than on-premises. The shift to cloud computing, exacerbated by COVID-19 and remote working, has delivered a slew of benefits to businesses: lower IT costs, increased efficiency, and reliable security.

As this trend continues, the threat of service outages and outages is also increasing. Cloud providers are very reliable, but they are “not immune fail.” In December 2021, Amazon reported Multiple Amazon Web Services (AWS) APIs were affected, and many commonly used websites went down within minutes.

So, how can companies mitigate cloud risk, prepare for the next AWS shortage, and deal with sudden spikes in demand?

The answer is scalability and elasticity – two essential aspects of cloud computing that greatly benefit businesses. Let’s talk about the differences between scalability and elasticity and see how they can be built at the cloud infrastructure, application, and database level.

Understand the difference between scalability and elasticity

Both scalability and elasticity are related to the number of requests that can be made simultaneously in a cloud system – they are not mutually exclusive; both may need to be supported separately.

Scalability is the ability of a system to remain responsive as the number of users and traffic gradually increases over time. Therefore, long-term growth is strategically planned. Most B2B and B2C applications that are increasingly used require this to ensure reliability, high performance and uptime.

With a few minor configuration changes and the click of a button, a business can easily scale up or down its cloud system in minutes. In many cases, this can be automated by cloud platforms with scale factors applied at the server, cluster, and network level, reducing engineering labor costs.

Elasticity is the ability of a system to remain responsive during short bursts or high instantaneous peaks in load. Some examples of systems that regularly face elasticity problems include NFL ticketing applications, auction systems, and insurance companies during natural disasters. In 2020, the NFL lean on AWS to livestream its virtual concept, when it needed much more cloud capacity.

A company that faces unpredictable workloads but doesn’t want a pre-planned scaling strategy may want an elastic solution in the public cloud, with lower maintenance costs. This would be managed by a third-party provider and shared with multiple organizations using the public internet.

So, does your company have predictable workloads, highly variable workloads, or both?

Work out scaling options with cloud infrastructure

When it comes to scalability, companies need to watch out for over- or under-provisioning. This happens when technical teams fail to provide quantitative statistics about application resource requirements or when the back-end idea of ​​scaling is not aligned with business goals. To determine a correct solution, continuous performance testing is essential.

Business leaders reading this should talk to their technical teams to find out how they’re discovering their cloud provisioning schemes. IT teams must continuously measure response time, request count, CPU load, and memory usage to view the cost of goods (COG) associated with cloud costs.

Various scaling techniques are available to organizations based on business needs and technical constraints. So, are you going to scale up or scale out?

Vertical scaling includes scaling up or down and is used for applications that are monolithic, often built before 2017, and may be difficult to refactor. It involves adding more resources such as RAM or processing power (CPU) to your existing server when you have an increased workload, but this means that scaling has a limit based on the capacity of the server. It requires no changes to the application architecture as you move the same application, files and database to a larger machine.

Scaling horizontally includes scaling in or out and adding more servers to the original cloud infrastructure to operate as one system. Each server must be independent so that servers can be added or removed individually. It involves many architectural and design considerations around load balancing, session management, caching, and communication. Migrating legacy (or obsolete) applications that are not designed for distributed computing must be carefully adapted. Horizontal scaling is especially important for businesses with high-availability services that require minimal downtime and high performance, storage, and memory.

If you’re not sure which scaling technique is best for your business, you may need to consider a third-party cloud engineering automation platform to help manage your scaling needs, goals, and deployment.

Consider how application architectures affect scalability and elasticity

Let’s take a simple healthcare application – which also applies to many other industries – to see how it can be developed in different architectures and how that affects scalability and elasticity. Healthcare was under severe pressure and had to scale dramatically during the COVID-19 pandemic, and could have… benefited from cloud-based solutions.

At a high level, there are two types of architectures: monolithic and distributed. monolithic (or layered, modular monolith, pipeline and microkernel) architectures are not natively built for efficient scalability and elasticity – all modules reside in the main body of the application and as a result the entire application is implemented as one. There are three types of distributed architectures: event-driven, microservices, and space-based.

The simple care application has a:

  • Patient portal – for patients to register and book appointments.
  • Physician Portal – for medical staff to view medical records, perform medical exams and prescribe medications.
  • Office portal – for the accounting department and support staff to collect payments and answer questions.

The hospital’s services are in high demand and to support growth, they need to scale up patient registration and appointment scheduling modules. This means they only need to scale the patient portal, not the doctor or office portals. Let’s see how to build this application on any architecture.

monolithic architecture

Technology startups, including healthcare, often go for this traditional, unified software design model for its speed-to-market advantage. But it is not an optimal solution for companies that need scalability and elasticity. This is because there is one integrated instance of the application and a centralized single database.

For application scaling, adding more instances of the load-balancing application causes the other two portals and the patient portal to scale out, even though the company doesn’t need it.

Most monolithic applications use a monolithic database – one of the most expensive cloud resources. Cloud costs grow exponentially with scale, and this arrangement is expensive, especially in terms of maintenance time for development and operations engineers.

Another aspect that makes monolithic architectures incapable of supporting elasticity and scalability is mean-time-to-startup (MTTS) – the time it takes for a new instance of the application to start. It usually takes a few minutes due to the large size of the application and database: engineers must create the support functions, dependencies, objects, and connection pools, and ensure security and connectivity to other services.

Event-driven architecture

Event-driven architecture is better suited than monolithic architecture for scaling and elasticity. For example, it publishes an event when something remarkable happens. That can be like shopping on an e-commerce site during a busy period, ordering an item, only to receive an email that it is out of stock. Asynchronous messages and queuing provide back pressure when scaling the front-end without scaling the back-end by queuing requests.

In this case study of healthcare applications, this distributed architecture would mean that each module is its own event processor; there is flexibility to distribute or share data across one or more modules. There is some flexibility at the application and database level in terms of scale as services are no longer linked.

Microservices Architecture

This architecture views each service as a single-purpose service, allowing businesses to scale each service independently and avoid unnecessary use of valuable resources. For database scaling, the persistence layer can be designed exclusively for each service and set up for individual scaling.

Together with event-driven architecture, these architectures cost more in terms of cloud resources than monolithic architectures at low usage levels. However, with increasing load, multi-tenant deployments and in cases where there are traffic bursts, they are more economical. The MTTS is also highly efficient and can be measured in seconds thanks to fine-grained services.

However, given the large number of services and their distributed nature, debugging can be more difficult and maintenance costs can be increased if services are not fully automated.

Space-based architecture

This architecture is based on a principle called tuple-spaced processing: multiple parallel processors with shared memory. This architecture maximizes both scalability and elasticity at the application and database level.

All application interactions take place with the in-memory data grid. Calls to the network are asynchronous and event processors can scale independently. With database scaling, there is a background data writer that reads and updates the database. All insert, update, or delete operations are sent by the associated service to the data writer and queued for retrieval.

MTTS is extremely fast and usually takes a few milliseconds because all data interactions occur with data in memory. However, all services must connect to the broker and the first cache load must be made with a data reader.

In this digital age, companies want to increase or decrease IT resources as needed to meet changing demands. The first step is moving from large monolithic systems to distributed architecture to gain a competitive advantage – this is what Netflix, Lyft, Uber and Google have done. However, the choice of architecture is subjective and decisions should be made based on developer capabilities, average load, peak load, budget constraints, and business growth goals.

Sashank is a serial entrepreneur with a keen interest in innovation.

DataDecision makers

Welcome to the VentureBeat Community!

DataDecisionMakers is where experts, including the technical people who do data work, can share data-related insights and innovation.

If you want to read about the latest ideas and up-to-date information, best practices and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article yourself!

Read more from DataDecisionMakers