Answers to Your 5 Questions about Database Scalability
Recently we hosted a webinar discussing the different scaling options for databases. There are a lot of databases on the market, so it’s important to choose one that offers the scalability and consistency you’re looking for, and that’s going to vary based on the application and deployment in question. To learn more about scale up vs. scale out, eventual vs. strong consistency, and the trade offs different database solutions make to optimize scalability, please watch the webinar or read more about it in this white paper.
During the webinar, we had several excellent questions that I answer in this post. If you have more questions, please reach out.
1. Does scaling work out any differently in multi-cloud environments than it does in a single cloud deployment?
Conceptually, no. Scale out of nodes across clouds works the same as within the cloud. But you need to think about the workload that is going to be in each cloud. If you have a multi-cloud environment and you have applications accessing the data and the database in both clouds, it may make sense to scale out together if the workload is being increased together on both clouds. If one has seen more workload for whatever reason, such as locality, you may want to simply scale out the nodes in that one cloud rather than both clouds.
2. How long does it take to start up a new node in a scale out architecture?
Unfortunately, it really depends on the architecture, as discussed in our database scalability webinar. If you consider scaling out partitioned architectures, for example, if you have three nodes and add a fourth node in your architecture, that node is empty. It has no data. You’d need to redistribute data from the existing three nodes to the fourth node before that node can start processing a workload. That could take minutes or that could take hours, depending on the size of the data that's being partitioned. Compared to NuoDB’s durable cache architecture, if you add a transaction engine (TE), it can start processing data instantaneously. That is because the cache is built dynamically and automatically. And if the data doesn't exist, it will automatically get that data from an adjacent node or storage depending which is faster. So the time required to start up a new node really depends on the architecture; there are some architectures that have very low startup costs in terms of how quickly that node can begin processing a workload.
3. How much cost savings might there be from switching from a scale up model to a scale out model?
You could actually have a significant cost savings but it's hard to say how much savings specifically. Let’s start by just talking about hardware costs. In the scale up model, you have one node that's your primary, which is processing all your reads and writes. For availability requirements, you need a high availability (HA) server and a disaster recovery (DR) server. So, in the scale up architecture, you have three times your capital expenditures to process the application workload on one node. In a scale out architecture, even if you have just three nodes, you’re actually utilizing all three nodes rather than two of them being held in reserve to meet availability requirements. So at a minimum, you can think about a 3X savings in capital costs with scale out architecture. With the ability to dynamically provision as needed, you can get significantly larger cost savings with the scale out architecture than you can with scale up architecture because you don’t need to pre-provision for maximum demand needs.
4. Does the distributed architecture of NuoDB work in a hybrid model where SMs and TEs are spread across on-prem and the cloud?
Yes, because we're cloud agnostic, NuoDB supports what is known as hybrid deployment, or an on ramp to the cloud if you will. Let's say, for example, you have a NuoDB database deployed on premises, it has two TEs and two SMS on premises and you want to migrate to the cloud. With other databases, it's generally a lift and shift process. In that case, you have to bring up the cloud database, run it in parallel for a while, and then shift by turning off the on premises database and switching over to the cloud database. With NuoDB, it’s different. We've got a distributed architecture, so you can take your two TEs, two SMs architecture, add maybe just a storage manager in the cloud.
NuoDB automatically replicates data between your on premises instance and your cloud provider, providing a copy of the database in the cloud. And maybe you’re just using that for DR purposes. There’s no workload there right now, but you have a copy and it's automatically synced. Now you have an environment that spans both on premises and cloud, which is known as hybrid. As the cloud environment solidifies, maybe you want to start moving workload to the cloud. You can dynamically add TEs to the cloud infrastructure. So, for example, now you have two TEs, two SMS in both your on premises and in your cloud environment. Applications are being served by both environments. That’s what is known as active-active, and you have access to all your data in both the cloud and in your on premises environment. You can continue to work in a hybrid environment or if you want to migrate all the way to the cloud at this point in time, you can shut off your on premises environment. And now your applications are fully in the cloud. So NuoDB has as a unique way of being able to migrate smoothly to the cloud as well as support a hybrid model - supporting running both across on premises and in cloud.
5. Can you give an example of a use case and the related requirements that would be a good fit for a distributed SQL scale out option?
Applications that are best suited for distributed SQL are applications that don't fit into a NoSQL use case. Traditionally, existing applications have been written to work on Oracle SQL Server, MySQL, Postgres. These applications have been written such that they assume that the database supports ACID transactions and strict consistency. If you're migrating from a single server, traditional SQL RDBMS to a distributed architecture, that's where the distributed SQL classification of databases shine. They are meant to be able to migrate existing applications to newer architectures. Again, there are tradeoffs between the different types of distributed SQL architectures. But if you're looking at the requirements where one class is better than the other, it's really where you're migrating existing applications. That's not to say NuoDB can't be used for new applications. For example, if you're building a newer financial services application that requires transactions and ensures consistency. In this day and age, it'd be foolish for most customers to start with a traditional old school database. They should look at a cloud database or a distributed SQL database for new applications.