Data is pervasive in our modern day lives. Typically, our data is stored in the form of files with some more organized entities (i.e. corporations, etc.) also storing some sets of data in databases. But most of us interact with stored data on a daily basis and files in particular are sometimes at the center of successes and failures on many fronts. Files rule the business world from presentations to legal documents and they have an enormous impact on our personal lives.
Given the proliferation of social media, files are more important now than they have ever been and this is especially so in the context of protection of our personal data. Ultimately modern-day digitization of most elements of data (files being a major factor here) is making protection of said data more important than ever – look no further than DocuSign, which even lawyers are now using for signed documents that can be used in legal proceedings.
For the sake of this article there will be no differentiation between files and messages (as in messages sent via Signal or WhatsApp). The bottom line is that our goal is to protect a set of important information that exists as an identifiable contiguous block of data. The identifiable aspect is important because, for instance, if your software can identify the type of a file, then so can an attacker if unauthorized access is achieved.
The problem is that files, and/or messages for that matter, in their complete and identifiable contiguous block form, especially at rest, are easy static targets for attackers. The typical mistake made in this respect is to rely on network security and/or file server/desktop security to protect our data at rest.
What happens when those traditional mechanisms fail? Your files are there, naked and exposed for the picking by the entities looking to steal/exfiltrate your data. Just look at the effectiveness of Ransomware cases – the ransomware is only successful because a victim’s files are static and stationary targets.
In transit it is not rocket science to reconstruct file objects given an understanding of the communications protocol in use (i.e. SMB/CIFS, NFS, HTTP, etc.) and if an entity were to strategically inject itself on the proper port (physical port) of some proper switch (in the context of data routing) it could take a copy of a file, which it has intercepted during transmission, offline for attack purposes. And the best part (from the attacker’s perspective) is that the recipient would never know this happened because he/she would have received their transmitted file as if nothing ever took place. If you think Nation States don’t have those capabilities, then clearly you are not in tune with modern day interception capabilities (whether lawful or not).
Another problem in this entire space is that of patterns and pattern recognition. And, as human beings are creatures of habit, we unconsciously create patterns of data and/or working behaviors that give savvy attackers an edge in their acts of hostility. Moreover, modern day advances in Artificial Intelligence (AI) make the detection of patterns, even over large data sets, far more possible than in times of past. Think of Machine Learning (ML) algorithms that create clusters of data, some of which do so based on pattern detection techniques.
Pattern recognition algorithms generally aim to answer the question of ‘most like’ matches based on input data. This is different than pattern matching algorithms, which aim to establish exact matches based on input data while applying pre-set patterns. While typical ML algorithms operate at a shallow level there is also the realm of deep learning which aims to go into multiple levels of depth. Ultimately, given the context of this article, one needs to understand that these are powerful tools that are now becoming integral parts of an attacking entity’s arsenal.
I enjoy conversations with security experts (it seems everyone is an expert these days, but that is another topic altogether) because many of them tote the encryption flag. They do so not realizing that while it may introduce an increase in the work factor for a successful breach, it will not stop said breach so long as an attacker can take the data set (i.e. a file), in its contiguous block form, offline and apply time and resources as elements of their attack methodology.
Once an encrypted file is residing within an attacker’s domain, attack possibilities are only limited by the attacking entity’s knowledge and access to resources (i.e. powerful hardware, advanced attack methodologies, etc.). And while there are limits today, in terms of attack capabilities, the future certainly looks like it will nullify, or at least mitigate, some of those limits – it’s the normal cycle of this type of technology.
I mean think about it, there was a time when the Triple Data Encryption Algorithm (TDEA or 3DES) was effective and time, resources, knowledge all chipped away at that until around July 2018 when NIST officially reported it was retiring 3DES. It was first introduced in 1998, so it had a good run.
The Advanced Encryption Standard (AES) algorithm of choice is used heavily these days and the Rijndael algorithm which was picked during the NIST selection process was done so in 2001.
Some attack types (a very limited list is presented here) to be aware of:
- Side channel attacks (i.e. timing attacks, differential fault analysis, etc.)
- Brute force attacks
- Boomerang attacks
- Linear cryptanalysis
The new approach
A new disruptive, innovative, and pro-active solution we have pioneered at nTropic Security is a product named KDisperse. It provides deep data protection based on multiple levels, or layers, of protective mechanisms applied to the data being protected. Beyond the layers, an underlying principle is to never establish, or expose, any pattern that an attacking entity can latch on to.
Some of the techniques/elements that make this methodology/solution effective in securing data:
- Distributed and disparate storage endpointsEntropy based sharding
- Entropy based obfuscation per shard
- Entropy based disinformation
- Entropy based naming per obfuscated shard
- Multiple randomly chosen storage endpoints
- Metamorphic shards
Distributed and disparate storage endpoints
Storage is cheap these days and has become commoditized, so building an ecosystem of distributed storage endpoints is pretty straightforward. The ecosystem should consist of a combination of local, network accessible, storage along with some cloud-based offerings. This type of hybrid ecosystem presents a great way to disperse obfuscated data elements, such that attacking entities are challenged in their exfiltration and/or data reconstruction efforts.
Entropy based sharding
During the phase of the process where sharding takes place it is imperative to do this in such a way where 2 runs on the same data set never yield the same results in respect to shard sizes. Otherwise there is a pattern for attacking entities to work with.
Entropy based obfuscation per shard
If an attacking entity knows the encryption algorithm used by its intended victim, they already have a key piece of the puzzle. KDisperse solves this by using a pool of algorithms in its operational mode so that once again there is no pattern and determining the algorithm used becomes a major hurdle to an attacking entity.
Entropy based disinformation
The use of disinformation is at the core of KDisperse whereupon an attacker has the added challenge of trying to figure out what data (at the lowest levels) is real and what is not.
Entropy based naming per obfuscated shard
In the spirit of giving nothing away via patterns each obfuscated shard is named randomly via the use of version 4 UUID (Universally Unique Identifier).
Multiple randomly chosen storage endpoints
So as not to establish a pattern KDisperse randomly chooses storage endpoints that are under the control of the entity looking for this depth of protection.
Metamorphic shards are shards that are rewritten over multiple iterations, such that each succeeding version of each shard is different from the preceding one.
Pure sharding of data is nothing new as information dispersal algorithms that perform this have been around for some time. Encryption algorithms have likewise been around for a long time and each one of these techniques certainly does positive things in terms of increasing an attacker’s work factor. Our contention is that a combination of those 2 techniques, along with the use of disinformation, is slated to future proof the protection we apply to the data sets we are aiming to secure. And that this union of techniques will provide deeper security, with even more effective work factor increases, than any one of those techniques alone. KDisperse provides a seamless pathway (via API calls) to the deep security described in this article where all of the complexity is abstracted away from the entity leveraging it for securing their data resources.
KDisperse aims to seamlessly integrate with existing software entities (i.e. web applications, mobile apps, etc.) via REST APIs, making it a very modern approach to enhance data level security and information privacy.