Author: Alexander Raif, Chief Healthcare Cybersecurity Architect, Maccabi Health Care Services
DNA, or Deoxyribonucleic acid, unites us with all the world population. It is well known, for example, that the genome of each person differs from the genome of any other person by only 0.1%.
Knowing your ethnic background, creating your family tree or finding distant relatives is easier today thanks to companies like 23andMe, Ancestry.com and MyHeritage that allow you to take a DNA test, no matter where you live. A DNA test is the analysis of genes inside human cells and the comparison of extracted data with the genes of certain ethnic groups. Or, if you wish, you can determine the genetic relationship with other people, as, for example, in the more familiar paternity test. Some variations of the test can even establish a predisposition to certain diseases and biological features of the body.
The procedure is simple: $ 100, some spit sent in a jar to the company’s address and a couple of keyboard clicks.
In addition to the ethnic origin of a person up to the 7th generation, a DNA test allows you to determine genetic susceptibility to certain diseases, such as Alzheimer’s disease, diabetes, or cancer. You can also find out the likelihood of becoming dependent on certain substances. Not to mention the physical features of the body, such as gender, eye color and other features that may help identify you.
This is very intimate data that should be kept completely confidential, and it would be unpleasant if this data were to be transferred to a third party. So how can it harm you?
Well the obvious risk is discrimination in hiring or applying for insurance - but what if the employer provides the insurance? He might have access to the data as well.
When you ask to perform a DNA test, you are presented with a consent form. On the consent form, you are also asked if you agree to share parts of your DNA results on a global database.
In order to find your distant relatives, you can place the genetic markers obtained using the test, for public access - to the open GEDmatch database. Since these markers are similar in relatives, the system matches them and finds them in the database of distant relatives, if they also inserted the data of their DNA tests to the platform. In the US, the police managed to uncover a serial killer after 30 years of investigations. They had a sample of his DNA and, using the GEDmatch platform, were able to identify his distant relatives who placed the data in the system. This narrowed the circle of suspects to one family and soon the killer was uncovered.
Since we share the genetic code with family members, it is worth remembering that by giving permission to process our DNA, we reveal the data of our relatives. And they might want to keep them a secret.
DNA typing was primarily used for comparison in criminal cases, in order to identify the criminals.
In several respects, this is analogous thereto of latent fingerprints.
Privacy and Information security Risk
Obviously, this is not a risk that the genetic-testing industry alone faces, but it is an industry that has a unique set of information on its consumers. And there was a recent hack in the space. More than 92 million accounts from the genealogy and DNA testing service MyHeritage were found on a private server, the company announced earlier this month. DNA data, specifically, was not breached, the company said. But a hack in this space is still a concern.
There is a constant question being asked – is it really possible for anonymized DNA Data to be hacked and identified?
Well, according to 2 studies in 2013, researchers showed it was indeed possible to identify people from anonymous DNA information.
How? What are the actual dangers?
Modern methods of collecting Genomic data suggest the possibility of its storage for a long time. If an attacker gains access to the test tube with your blood or saliva, DNA extraction from it will become a matter of technique. Therefore, samples should be carefully protected: access to the laboratory should be protected by a reliable throughput system, and the test tubes should not carry personal data, so that no one - including the laboratory staff - can identify to whom they belong.
Anonymized, or de-identified genetic sequences and shared genomic information both have many benefits, but they also present some serious security risks or, at least, potential future security risks, according to genetic testing authority Adam Tanner in his 2016 article, “The Promise & Perils Of Sharing DNA” on Undark.org. The following are a few of the risks Tanner exposes:
- Because of the nature of DNA, even de-identified genetic sequences may be re-identifiable in the future. Essentially, someone may figure out a way to use an unidentified DNA sequence to produce the name of the person to whom that DNA belongs.
- Genetic information can potentially be used for identity theft purposes, and this possibility will very likely increase as technology improves.
- People who have been identified as prone to certain diseases or conditions could be targeted by marketing firms or singled out as a risk by insurers and potential employers. The popular 1997 speculative fiction film Gattaca explores just such a possibility.
- Even people who never submit to genetic testing could be compromised by blood relatives who have their DNA sequenced.
Genomic Data Research must be protected within each step of the Genomic lifecycle.
Figure.-1: Schematic of the workflow of clinical genomic sequencing from NCBI ( National Center for Biotechnology Information, U.S. National Library of Medicine)
Patient or customer information at the stage of prescreening and DNA sampling must be protected in all medical systems.
All medical and diagnostic data must be protected by best practice methods.
Lab equipment and sequencing data must be separated and protected apart from other IT systems.
For genomic data management, there are many hardware and software options for the storage and transit of DNA data. Data can be stored on an organization’s local servers or in a cloud service such as Amazon Web Services or Google Cloud. Data can be moved via dedicated internal cables within an organization, or over the Internet, or physically via portable hard drives. For DNA data that is stored only (or largely) in the cloud, data transit can be limited through cloud computing practices, essentially requiring any analysis software to be “brought to the data” in the cloud rather than the data being analyzed by software on a local machine.
Before we conclude, I want to mention another aspect of DNA data.
How will the DNA storage industry evolve?
Scientists have long been saying that DNA can be an ideal repository of information: it is dense, stable, and easy to copy. Over the past few years, researchers have recorded many different data on DNA, for example, “War and Peace” by Leo Tolstoy, the song Smoke on the Water by Deep Purple, or a gif with a running horse. But for DNA to be able to replace existing silicon and magnetic drives, it needs to become cheaper and clearer for writing, reading and storing.
If successful, this technology will allow you to save data which needs to be kept secure for legal reasons, such as rare surveillance camera records, medical data and historical government documents. This is a new area of technology that will bring a new era of risks.
We must use and enhance tools and procedures we all know today such as:
- Encryption: Ensuring that data is encrypted in transit and at rest.
- Authentication: Verifying the identity of individuals accessing the data.
- Two-factor authentication: involving a token sent to a mobile device or key fob, providing additional validation.
- For sensitive data, in-person authentication could be required.
- Authorization: Narrowing the number of people with access to data based on the project or task, and limiting the duration of that access.
- Monitoring and auditing: Assessing and improving system security, and tracking details of use, unauthorized use and compliance. Routine vulnerability assessments and penetration tests. Using a block chain ledger system may be one way to verify history of data access throughout lifecycle.
- De-identification: Stripping an individual’s identity from the DNA data is useful, but it does not achieve true anonymization in many circumstances (as we discussed above).
At the same time, despite these technological protections, all these options are vulnerable to risks that arise through human compliance behavior. In general, many data spills result from (non-) compliance behavior among individuals who are authorized to use sensitive data.
Genomics offers bold promises for revolutionizing medicine as we know it—some of which have already been realized. There are tremendous potential benefits from genomics, revealing new life-saving and life-enhancing discoveries for precision medical care. Genomics databases will play an important role in achieving those breakthroughs. At the same time, we also need to appreciate the serious risks that the disclosure of sequenced DNA results poses for individuals.
If indeed the genomics field is at a critical inflection point, as many believe, then this is a crucial point for us to wrestle with the tensions inherent in promoting future research while at the same time safeguarding individual privacy.



