Understanding the data layer for records people and where it will take us - if we embrace it
In records, we're inherently uncomfortable with data.
That's a fact.
The idea that something isn't a thing that we can manage is anathema to thousands of years of practice.
It shouldn't be that hard though.
We just need to think paragraphs instead of documents.
We have many paragraphs that in themselves are accurate represents of facts as they were understood at certain points in time.
Sometimes we need the paragraph that tells us where someone lives.
Sometimes we need the paragraph that tells us what they are allergic to.
The huge advantage that data has over us, is that it has solved the biggest problem we face - the problem of too much information.
Whatever people say about "drowning in data," when they're drowning in documents (what we might call records), the problem is infinitely worse because when someone wants information, we have to present them the whole document - even if they only want the address and allergy details that are printed on page 23 and page 112 of two separate documents.
Data is just about giving people the paragraphs they want, not the whole document.
The magic is in making sure that we understand the provenance of the paragraphs so that we know it is reliable information on which we can base our business - data people refer to this as lineage, and it works when it's done well, but just like provenance and authoritative documents it's often taken on faith. What is has that we don’t, is statistics - and when businesses use non-authoritative statistics, things go wrong at scale, whereas when they use non-authoritative records, the damage tends to be a little more localised.
Where we struggle the composable nature of data, is that we generally have a functional way of thinking, and when we look at two paragraphs, we don't feel like we can orient ourselves around them as a record unless they were created by a single transaction of a single business activity.
This means that when we come to manage the thing that most of us spend most of our time thinking about - destruction, we don't feel like we can wrap our heads around what we're supposed to be destroying.
The simple reason for this, is that we still think retention schedules are a valuation model.
They're not.
They're a tool from an era in which we knew that the documents created by a transaction would be needed for a certain period of time because they only had one use.
Inherent in them is still this idea that there is a transaction, and that's the whole of the thing, and we should retain and destroy the whole of the thing as we need to.
Frank Upward and Co saw the problem inherent in this, that data exposed, when they originated the continuum model.
Unfortunately most of us still haven't embraced that way of thinking, so we want a thing that represents the whole transaction, that we can take custody of, and eventually destroy - if there's no whole "thing" that represents the transaction we don't even really know how to think about a lifecycle, because we still think “proof and liability” instead of “value.”
What people want is the information they need (and only the information they need), when they need it, delivered to them reliably.
And records massively overdelivers on the information front - and not in a good way.
The average document has masses of information that's just not necessary for anyone.
The average field in a database has exactly the data needed to describe the thing someone needs to know at a level of quality that is appropriate to the task they have to do with the data.
When we use individual data fields, and compose them into a record, we can give people just what they want, when they want it - reliably.
Privacy is giving us a bit of a moment in the sun again because the idea of a transaction and a business purpose for that transaction data matters again - but it's the last gasp of the document-transaction lifecycle model.
The future is going to be much smaller - records won't be things, and they won't be databases, they'll be streams of data (probably graph data) that we'll compose and recompose as the relationships between things expose their value, and we'll prune the branches that are of no value as we go.
I hope that records adapt to this future, but the first thing we have to do is get our heads around the composable nature of data-as-records, and just what that means for how we value, manage, expose, compose, recompose and dispose of the data that we are holding and that our organisations need.