NewSQL – The Next Phase of Evaluation in Database Technologies
You may have started hearing about it a lot lately, and maybe wondering what this buzz around NewSQL is all about. To explore it more, here, we will go through the history of evaluation of this new era in enterprise database management.
History – RDBMS and SQL Era
These all started in the first half of the 70s when IBM introduced SQL (Structured Query Language) to store and retrieve data. A lot of technology companies quickly adapted to this new concept and started to introduce their own RDBMS (Relational Database Management System) implementations. Some of the major ones were Oracle DB from Oracle Corporation, Informix from IBM, DB2, and the open-source counterpart of these, the MySQL.
In a relational database management approach, data is getting stored in a table that is arranged in tables of rows and columns by ensuring ACID (Atomicity, Consistency, Isolation, and Durability) conformity. In the SQL model, data was stored in big DB servers. Still, during the explosion of internet-related enterprise management systems in the ’90s, the volume of data grew exponentially—this paved way to a new paradigm in DBMS, called NoSQL.
The NoSQL era
NoSQL came out to satisfy the increasing market demand for data scalability and storage in a distributed environment. The same copy of data is made available on different machines. CAP Theorem (Consistency, Availability, and Partition Tolerance) acted as the NoSQL technology’s backbone. NoSQL-based databases eventually leverage consistency by guaranteeing simultaneous read and write operations.
The proliferation of data sources and the evolution of new data-centric technologies like Big Data and Artificial Intelligence matured NoSQL databases started to penetrate slowly into the RDBMS market. Cassandra, MongoDB, and Redis all remained popular in the NoSQL top listings, and many of you may be familiar with these. However, the needs of diversified database users were never satisfied, and now NewSQL comes out as a promise to solve many such problems.
NewSQL into the picture
While there are various options in NoSQL databases now in use, another paradigm that is slowly arising simultaneously is NewSQL. This latest database approach combines the pros of RDBMS in terms of database consistency and NoSQL in scalability. NoSQL achieves these through its innovative architectural patterns and the most efficient storage engines for SQL.
The latest NewSQL databases are primarily based on the Spanner database of Google and Calvin’s academic paper from Yale. Google’s Spanner is a multi-version, scalable, globally distributed database, which is synchronously replicated. This was the first of its class of database of the system, which distributes data on a global scale and can support consistently distributed transactions externally.
Yale academic paper of Calvin also guarantees active replication as well as ACID conformity of distributed data transactions. Some of the other leading NewSQL DBs are CockroachDB, TiDB, Vitess, and FaunaDB, etc. Each of these NewSQL DBs has unique takes on ensuring the two primary characteristics of scalability and consistency. Further, we will dig deeper into these databases and explore some of the top options, as suggested by RemoteDBA.com experts.
This is an open-source DB that can support distributed Hybrid Transactional and Analytical Processing (HTAP), which confirms compatibility with the MySQL approach. Even though open-source, there is a company named PingCap, which supports TiDB. The first version of TiDB was released back in 2017, and the latest version now is 2.1.9.
- Hybrid – This DB can support both the workloads of analytical processing (OLAP) and transaction processing (OLTP). So, there is no need for any ETL from the transactional databases to the analytical DB.
- Cloud— TiDB can be entirely operated in the cloud as private, public, or and hybrid. TiKV storage layer of TiDB could be similar to a sandbox project by the Cloud Native Computing Foundation.
- Compatible with MySQL – TiDB can be used for applications on MySQL servers, and connection can be established using client libraries without any changes needed to the application.
CockroachDB is another open-source distributed SQL DB built on a strongly‑consistent and transactional key-value store. It can scale up and down horizontally and can also effectively survive any disk, system, rack, or even entire datacenter failures and manual intervention. CockroachDB can support consistent ACID transactions and provide a familiar SQL API for manipulating, structuring, and querying data. CockroachDB is backed up by the company named Cockroach Labs, and the first version of this DB was released back in 2015 with its latest version now is 19.1.1.
- SQL compatibility – Even though CockroachDB has a strongly consistent and distributed transactional key-value store working as its base, its external API is SQL compatible.
- Availability of Multi-Active – CockroachDB has a multi-active availability. This offers many benefits in terms of reading and writing operations on every node in the cluster without any conflicts. Multiple replicas may run identical services, and the traffic gest routed to all of those. If there is a failure in any replica, then the others can instantly take over the traffic.
- Changes in Online Schema – CockroachDB offers built-in schema changes online, a simple model of updating a table schema without causing any adverse consequences on the application. The changes to the table schema may happen while the database is still running. Schema change may run in the background. This will let the queries to process naturally without any impact on reading or writing.
FaunaDB is another most modern distributed cloud database, which is apt for container-centered environments. This is considered the first commercial DB in the world, Calvin, inspired by a strictly serializable protocol for transactions in multi-regional environments. Fauna DB is backed by Fauna, who offers on-premise, cloud, and serverless versions of FaunaDB.
- Active-Active – It can support a multi-cloud, master-less, Active-Active architecture. This architecture will ensure 100% uptime for the databases.
- Multiple Models can also support numerous data models like a graph, relational, documental, etc.
- The temporality of data – Snapshot-based storage engine can be provided, which will help retain the historical data for the configured period and also permits any corrections of errors in those snapshots itself.
- Horizontal scalability – Horizontal scalability is supported, allowing users to add or remove the nodes without tampering the application rendering at the same site or across multiple global locations.
If you are in search of more choices in NewSQL DBs, some options you must explore are Vitess, Citus, VoltDB, NuoDB, ClustrixDB, etc. Similar to how the NoSQL wave gained momentum during the internet era; now the NewSQL wave is also gaining momentum in the public cloud era.