Defining data at the atomic level


To date, our industry has focused on meta-data management and data management at the dataset level. With graph technology, ML (Machine Learning) and compute speeds we no longer need to manage at such a high-level. We can and need to manage data at the most atomic level; i.e. Data Asset Management at the Atomic level. In this post we want to focus on descripting what we mean by the atomic levels of data.


CDEs, Data Elements and Natural persons

Here we should spend a moment with the DCAM definitions. DCAM defines the following:

Business Element: A unit of information that has a specific meaning in the context of a business process or collection of processes within a data domain.

Business Elements can be Atomic: Lowest level of detail, factual meaning.

Data Element (DE): Unit of data for which the definition, identification, representation and permissible values are specified by means of a set of attributes (ISO 11179-1).

At Pontus we build on these definitions as to say that: a data element should be managed at the atomic level. Meaning, each, and every instance should (and can now) be managed.

So to focus on the PII data elements; DE (Data Element) (e.g. DoB) which is also an element of PII. It is important for us to operate at the atomic level especially when it comes to our natural person data in a post-GDPR world. We have a datasets which are a collection of our customers with 100 DoBs. We can now manage each instance of that DoB – therefore, we are managing at the instance level of each DE. That is, we need to manage at the individual instance of each CDE for each customer and employee.


Worked example

Using DCAM or DAMA we will assume that we have a Customer data domain defined. Within that Customer data domain, we define the set of CDEs (Critical Data Elements).

As we can scan every dataset – structured or unstructured for any instances for that DoB (Date of Birth) and probabilistically match to each natural person.

As Rozell Coleman was born on 16/08/1960 (see first record above). We find a single instance of this record in the CRM (Customer Relationship Management) of the firm we are interested in. We also find through deploying a Data Asset Management tool that there is another instance of Rozell Coleman in the email marketing system.

We can query Pontus Vision to establish how many instances of DoB for that particular customer (Rozell) across both structure and unstructured datasets.

This enables us to answer such questions like:

How many instances do we have of Rozell, in which datastores (structured and unstructured)?

How many items of PII (and copies) do we have for Rozell?

Challenges with this definition?

One of the key challenges for this definition of atomic data is that the definition changes with the granularity of data. The atomic nature of a customer is a customer, however, scaling in, the atomic element of a DoB is the month.


References for further reading

Data Domain: Concept taken from DAMA

CDE: Critical Data Element concept from DCAM / EDMC.

ISO 11179-1:2015(E): Using the terms Term, Data Element Concept, Data Element and Data here. However, I am going one below – to talk about an Instance of Data being at the atomic level. Of course, some may argue that atomic of the (especially of the example we choose – DoB). At the DoB can be broken down into year, month and day. However, applying the Data Element Concept we are understand the atomic level to be in context.

About the Author

Daniel Rolles is the author – bio can be found here.