Why Vaxine?

Vaxine solves the global write-path latency problem for backend applications.

Tl;dr

Geo-distributed apps need a geo-distributed database that’s fast for both reads and writes. CRDTs unlock low write-path latency. However, commutativity is not enough to guarantee data integrity. For that, you need a rich-CRDT database that also provides invariant safety.

Need for geo-distribution

Let’s assume that you have a database driven backend web application. It has a typical relational data model, including various constraints and invariants. You have users interacting with it in different regions. You’re aware that the system performance is dominated by network latency. So you want to run your app and your data close to your users.

For example using Fly.io or Cloud Run.

Problem of write-path latency

You can use read-replica fan out (say with Cloud SQL or PolyScale) but this leaves your writes centralised and slow. Alternatively, you can use an active-active geo-distributed database like Cockroach or Fauna. However, these use synchronization to achieve consensus when writing data – which actually makes your writes much slower.

What you want is to accept writes at the nearest region immediately, without synchronising.

Challenge of concurrent writes

You can achieve this with eventual consistency systems. These range from Yugabyte multi-DC with asynchronous “XCluster” replication (see the multi-region modes of Yugabyte here), to more specialist systems like Automerge or Macrometa. However, this solves one problem by creating another: how to resolve overlapping concurrent writes.

If two nodes in different regions accept writes to the same data at the same time, without stopping to coordinate or check with each other, then you have to have a strategy to resolve the competing writes. In this, there are two approaches: commutativity and tentative updates (or some combination of the two).

Commutativity != data integrity

The commutative approach says “OK, we’ve accepted this transaction on our Las Vegas cluster, so we’re going to have to apply it to our Frankfurt and Tokyo clusters, otherwise we’ll have inconsistent data”. This is where Conflict-free Replicated Data Types (CRDTs) come in. They guarantee that a write accepted at one cluster can be applied on all of the others.

The trouble is that whilst the commutitivity of CRDTs guarantees conflict-free replication and merging, it doesn’t guarantee data integrity. Think about two concurrent transactions in Frankfurt and Tokyo. Imagine they were both selling the last ringside ticket to a boxing match in Las Vegas. Imagine your database has an invariant that says “you can’t sell more ringside tickets than there are ringside seats”. Each individual transaction satisfies this. But the combination of two concurrent transactions breaks the invariant[2].

Tentativity is hard to trust

The other approach is to reserve the right to abort a transaction, even after it’s been accepted by one of the active clusters. This is tentativity – where you tentatively apply an update, with the understanding that it could be still be invalidated or reversed. Implemented by systems like Couch and Replicache.

Tentativity is similar to re-modelling without invariants, in the sense that it also bubbles the complexity up-to the application layer to handle the consequences of data consistency concerns. In our example above, one of the transactions can be rejected and that fact replicated back through the system, bubbling all the way back up to the user.

Which is quite a stressful UX to unravel if you’ve just paid $40,000 for a ringside seat.

Vaxine solution

So what’s the solution? After all, you want to operate with low latency, you do want to be able to rely on constraints and invariants and you really do want be able to trust your database.

What you need is a next generation geo-distributed database system. One that’s designed to operate with low-latency, commutative data structures by default. Whilst still allowing you to work with a relational data model and rely on constraints and invariants when necessary.

You want that system to expose the appropriate complexity and primitives to be able to specify and craft latency vs consistency trade-offs. Without leaking unnecessary complexity into the application domain. And you want to optimise your application for coordination avoidance[3].

Rich CRDT database

That’s what we’re building with Vaxine. A next-generation global database that operates with low-latency and strong data integrity.

Vaxine builds on AntidoteDB, a planet-scale database that implements the Cure protocol, formally proven to be the strongest possible consistency mode for a highly available, low latency database[4].

Vaxine is working to extend Antidote with:

Through this, we aim to make low-latency CRDT technology accessible to a wide range of mainstream applications.

[1] You can read about commutativity in Keeping CALM: When Distributed Consistency Is Easy.

[2] You can solve this with a rich-CRDT called a Bounded Counter which extends commutativity with network coordination to enforce numeric invariants.

[3] Minimising network-based coordination to achieve consistency or consensus. See the ECD3 guidelines.

[4] See Cure: strong semantics meets high availability and low latency. Cure combines causal+ consistency with highly available transactions (see the HAT paper) and sticky availability.

[5] See IPA: invariant-preserving applications for weakly consistent replicated databases.