The Pros and Cons of Database Scaling Options
As cloud and container-based environments become ubiquitous, organizations are increasingly moving away from high-performance static hardware to dynamically changing, virtual, commodity infrastructure. Systems need to expand and contract easily to meet an ever-changing demand – and that includes the database.
Database scaling is often implemented by clustering. Here’s a quick overview of the pros and cons of traditional architectural implementations:
- Shared-disk benefits include potentially lower cost, extensibility, availability, load balancing, and relatively simple migration from a centralized system. However, shared-disk tends not to scale as well as shared-nothing, and often relies on SANs, which can drive up the cost.
- Shared-nothing offers improved scalability, but requires a partitioning scheme to apportion the data across the nodes of the database. The DBMS must know how to partition and access the data based on the partitioning scheme. Repartitioning is a non-trivial exercise that is not conducive for 24/7 processing because it requires DBA and programmer effort as well as database downtime causing application outages.
- Database sharding is often used with a shared-nothing approach to automate partitioning and management. The primary benefit is that the processing workload against a large partitioned table can be spread across multiple servers. However, instead of having a single database to manage, there are now multiple databases, each with its own resources to manage. Additionally, sharding can negatively impact fault-tolerance, which is why sharding is often accompanied by replication.
- Replication is often used with shared-nothing and sharding to bolster fault tolerance. Replication across multiple servers can be easy to set up, but on-going administration and management is required. Replication also requires additional storage (for each replica), as well as I/O and CPU usage to support the data replication process.
Today, there is also a new, services-based alternative to the traditional approaches spun out of a client-server mindset. Called Elastic SQL, this approach maintains SQL and ACID for data integrity, but with an architecture that can elastically scale up and down as needed.
In elastic SQL, database performance, latency, redundancy, and availability can be adjusted by bringing peers online and offline as needed, with no downtime.
One elastic SQL variant uses multi-version concurrency control in a peer-to-peer architecture that splits transaction processing and storage into two, independently scalable tiers. The transactional tier provides an in-memory, on-demand cache distributed across multiple servers and potentially even data centers. The storage tier uses a set of peer-to-peer coordination messages to manage commit processing and access data when it is not available in the transactional cache.
In elastic SQL, database performance, latency, redundancy, and availability can be adjusted by bringing peers online and offline as needed, with no downtime. No single peer is wholly responsible for a given task or piece of data so any task can fail or be purposefully shut down without impacting service availability. The drawback however is that today’s elastic SQL systems excel in OLTP workloads or mixed OLTP & real-time analytic queries, but not pure OLAP ones.
When considering options, be sure to understand the pros and cons of each of these methods before implementing them at your shop.
Craig is president & principal consultant of Mullins Consulting, Inc., a consultant for datAvail, and the publisher/editor of The Database Site. His experience spans multiple industries (banking, utilities, software dev, research and consulting).