How Vaxine works

Vaxine is a global database platform that uses transactional causal+ consistency and rich-CRDTs to provide low latency with strong data integrity for geo-distributed OLTP applications.

Global database

Vaxine is based on AntidoteDB, a planet-scale database system that allows you to serve data close to your users, wherever they are in the world.

Multi-master

Antidote is a multi-master, active-active system. This means that every database cluster can handle writes. The Vaxine Mesh automatically geo-distributes your data and routes requests to the nearest region.

Multi-region

Each geographical region has a database cluster. Within the region/cluster, data is partitioned and sharded across servers with snapshot isolation. This provides fault tolerance, redundancy and strong consistency.

Data is then replicated between regions with transactional causal+ consistency using asynchronous, operation-based replication. (See the consistency section below for more details).

The stack looks like this:

Overview of the Vaxine technical architecture.

You can learn more about the architecture of AntidoteDB here.

Relational OLTP

Antidote is a CRDT database optimised for OLTP workloads. It’s designed for low-latency interactive workloads, not for very large data throughput or analytics-style queries. It’s primarily a key-value store, with consistent primary indexes.

Vaxine is building a rich query and rich-CRDT layer on-top of Antidote. This is connected to the Vax data access library, which allows you to work with CRDTs in a similar way to working with standard relational schemas. Under the hood, operations are translated to Antidote’s native API and wrapped in highly available transactions.

Low latency

Traditional geo-distributed databases, like Cockroach and Fauna, use synchronization and consensus algorithms to maintain consistency. The trouble is that consensus introduces extra network requests between regions, resulting in higher write-path latency. For example, the following table illustrates typical write-latency times, assuming a five-region topology and a single web request that initiates a single database write:

Approach user to edge edge to app app to DB consensus Total
Centralised 10ms 200ms 0ms 0ms 210ms
Geo app, central db 10ms 30ms 180ms 0ms 210ms
Geo db, consensus 10ms 30ms 1ms 400ms 441ms
Vaxine 10ms 30ms 1ms 0ms 41ms

The picture sharpens further if the client request initiates multiple database requests. For example, a controller that issues a query, followed by two writes:

Approach user to edge edge to app app to DB consensus total
Centralised 10ms 200ms 0ms 0ms 210ms
Geo app, central db 10ms 30ms 380ms 0ms 420ms
Geo db, consensus 10ms 30ms 3ms 900ms 943ms
Vaxine 10ms 30ms 3ms 0ms 43ms

CRDTs

Vaxine solves write-path latency by using Conflict-free replicatable data types (CRDTs).

CRDTs are data types that can merge concurrent writes without conflicts. They allow clusters to accept writes without cross-region synchronization / replication consensus. Instead, writes can be accepted with low latency and replicated asyncronously, with commutative merge operations ensuring that all clusters converge on strong eventual consistency.

You can see the data types supported by Antidote here, the antidote_crdt repo and this CRDT visualizer. Vaxine is working to extend the CRDTs supported by Antidote. See the Types documentation for more details.

Coordination avoidance

Vaxine also supports rich-CRDTs that use cross-regional coordination when necessary to preserve invariants. Vaxine aims to avoid this coordination where possible using pro-active background processes. The Vax data access library also follows ECD3 guidelines to keep writes surgically precise.

Data integrity

Antidote implements the the Cure protocol that provides transactional causal+ consistency. Vaxine extends Antidote with rich-CRDTs that preserve invariants. This combination of TCC+ with invariant safety is how Vaxine ensures strong data integrity.

Consistency

The Cure protocol was designed by academics including Annette Bieniusa, Nuno Preguiça, Marc Shapiro – see the Team) – as a “cure for consistency under partitioning” for a synchronization-free, fault-tolerant, high availability database. Cure provides causal consistency and highly available transactions. Combined with sticky availability, these provide transactional causal+ consistency (TCC+) – the strongest possible consistency mode for a low latency database[1].

Causal consistency

Causal consistency guarantees that if a read or write depends on the effects of a previous write then the causal order between them will be respected. However, if two operations are concurrent and have not seen each other, then it’s fine for them to be applied in any order. It also implies that you read your own writes, as per the Jepsen Consistency diagramme below:

Ultimately, a geo-distributed database needs to take into account the relative perspectives of the different replicas in the system. Because replicas are in different locations and because information can only travel at the speed of light, the answer to “which of these events happened first?” depends which perspective you answer from. Causal consistency embraces this uncertainty.

This video on logical time is helpful for understanding the happens-before relation that causal consistency guarantees.

Highly available transactions

Highly available transactions guarantee the atomic application of a set of operations. I.e: you can wrap multiple writes within a transaction and those writes will either all be applied or none will be applied. See the paper for more information about the guarantees available under high availability and sticky availability.

Sticky availability

High availability allows writes to be accepted at every server, wherever it is in the world. Sticky availability is a mode of high availability where clients always talk to the same server. When this condition is true, it allows high-availability systems to provide additional consistency guarantees, including read-your-own-writes and causal consistency.

Note that Vaxine is designed primarily for low latency, not guaranteed availability. In particular, some rich-CRDTs rely on communication mechanisms (such as sharing out rights to increment a counter) that can temporarily degrade availability in the event of cluster failures or partitions.

TCC+

All of the above properties – CRDTs, causal consistency, highly available transactions and sticky availability – combine in the Cure protocol to provide transactional causal+ consistency, or TCC+. This is the consistency baseline provided by the Vaxine system.

Invariants

In addition to TCC+, Vaxine also extends Antidote to provide additional invariant safety. An invariant is a guarantee that your database provides that your application can rely on. For example, a guarantee that usernames are unique, or that you can never have more than 8 active users on a team. Relational databases often allow you to define these guarantees using constraints, such as a unique constraint or a check constraint.

Rich-CRDTs

Vaxine guarantees invariants using rich-CRDTs. These are CRDTs that:

  1. extend and compose the replication semantics of basic CRDTs; and
  2. in some cases introduce additonal communication protocols in order to preserve invariants

Vaxine is working to extend Antidote with rich-CRDTs to preserve invariants including numeric invariants, unique constraints and referential integrity. See this post introducing rich-CRDTs and the reference docs for more information.

[1] See the Jepsen description of causal consistency. The Highly Available Transactions paper also has a very good walk through of consistency guarantees that are possible under different AP modes.