Member-only story
Versioned Data Management System Design
Introducing a Reliable Way to Manage Your Critical Data
Introduction
Previously, I introduced a distributed ledger system. From a technical level, I explained how to build a data store that supports version history consistently. Expanding on this, this post draws parallels to Git and introduces a data management workflow accordingly.
Version History for Data
As discussed in the distributed ledger system, we maintain a linear history for each piece of the data. We can look up the current state by retrieving the entry with the most recent version number. Additionally, with a given version number, it allows us to access the earlier historical state.
To know the delta for a specific version, there are two approaches: 1) comparing or subtracting the current state from the previous state, or 2) storing the delta alongside with the data entry. The diagram below illustrates the latter method.
Numerous types of data can benefit from maintaining a comprehensive record of all the historical mutations. Examples include financial account data, supply chain inventory data, and critical configuration data.
As a concrete example, consider a bank checking account. Beyond simply tracking the current balance, it is crucial to have access to a detailed history of all transactions.
Not hard to see, this system closely resembles version control systems for source code. Both systems maintain an immutable changelog, with a key distinctions: in source control, we keep track of the delta only, whereas in the versioned data system, it is more important to keep track of the current state. Storing delta is optional and can easily be derived by comparing or subtracting the current state from the previous state.
In addition, the diagram above illustrates the “state” being an integer number. But in practice, this can be the value of any arbitrary key-value store.