A potential mine of rich information on user habits; a repository of data that has the power to transform business; the lifeblood of the cloud and a major stimulus of innovation: big data makes some big promises. Unfortunately it’s these benefits that have seen a lack of focus on the drawbacks. For instance, little attention has been paid to how to secure this colossal amount of data and the sensitive data that is typically extracted.
Today’s Big Data projects are far more granular and larger than traditional large dataset analysis. The volume and real-time necessity of big data projects require intelligent and fast processing. To achieve the processing requirement to deliver the four V’s (volume, variety, velocity, and veracity) requires parallel processing on a scale probably never seen before by most organisations. Done properly, harnessing this power gives organisations a true finger on the pulse of what is actually happening, today, on specific areas of the business.
The biggest challenge is in anonymising data sets and ensuring that the identity of the user is not compromised when that data is produced, stored or used. This can also be an issue as time goes by, as data changes in use. Understanding the value of your data may seem like an obvious step in securing data, but when that data is constantly evolving, this becomes imperative. The addition of other data can also potentially pave the way for what is known as a jigsaw attack. For instance, taken in isolation the age, location and status of an individual would still preserve their anonymity but if that data is combined and is unusual in its characteristics, it’s much easier to deduce the identity of the person that data refers to. The UK Anonymisation Network (UKAN) uses the example of a 16 year old widow.
This process is also called deanonymisation and essentially can see someone reidentified from associated data sets. Many privacy groups have expressed concern over reidentification, such as EAGDA which suggested the Genomes Project could expose data on specific individuals. Examples of reidentification already exist, with AOL and Netflix both accused of this.
Yet anonymisation is only part of the solution. If anonymisation is too rigorously applied, the data can also become worthless, making data analysis pointless. As with many security controls, it’s getting the balance right between protection and facilitation, so that security aids rather than obstructs the process.
The size of the data set must also be considered, afterall big data is about the four V’s – Volume, Variety, Veracity and Velocity. If the planned controls to protect the data hinders the ability to access and process the data at speed, you impact velocity. Something often overlooked when applying security controls to big data projects.
The ICO (Information Commissioners Office) offers some good advice in ‘The Data protection Code of Practice on Managing The Risks Related to Anonymisation’ when it comes to handling big data. It effectively states that, provided the organisation can demonstrate it has used effective anonymisation, the purpose limitation requirements of the Data Protection Act do not apply.
This is not a ‘get out of jail’ free card, however. To demonstrate due diligence will often require the company to go above and beyond the requirements of the DPA, so its merely a case of one cancelling out the other. Effective anonymisation involves identifiers to be removed, obscured, aggregated or altered. Where the grey area comes in is what constitutes an identifier. The ICO refers to formal identifiers such as name, address or social security number but others believe informal identifiers which make the subject vulnerable to identification should also be covered. For this reason, we’ve seen other strategies come to the fore, such as layered anonymisation.
Organisations should also have policies in place for how to respond in the event that deanonymisation occurs. As with any incident response plan, measures should be put in place to mitigate the effects of disclosure and there should be a traceable anonymisation audit trail.
It’s also important to note that anonymisation is only part of the answer. Effective big data security needs to take a holistic path, involving the application of technical controls, effective education and training, the adoption of formal policies and procedures for handling, sharing, releasing and reporting, and physical network security measures. These will need to be adapted, though. For instance, access control and storage will both need to be different due to the unstructured format and ad-hoc querying typical to big data analysis.
- Align your Security and Functional project requirements – Work closely with your IT capability to ensure you have the necessary tools for the programme now and in the future. By promoting cross department collaboration, orchestration of the programme will ensure you are prepared for the projects evolution.
- Identify and anticipate compliance requirements – Organisations must work closely with their governance and compliance departments to head off any future compliance issues. Jeopardising your organisations compliance status could be extremely costly in some cases; collaboration will avoid this problem. The most common compliance requirement will be the Data Protection Act and the use of personal data but the EU General Data Protection Regulation will also need to be considered.
- Update governance frameworks – Traditional approaches to data handling must be adapted to achieve the Four V’s and accommodate data value evolution. Governance and compliance frameworks know big data strategies will, in most cases, have a big impact on data content and handling requirements. In an ideal world, there should be far more assistance given in the framework; perhaps closer working between Big Data alliances and Frameworks such as ISO27001 will help to achieve this.
- Develop formal security policies for the programme and organisation – For example, what happens if a big data analyst stumbles across rich data sets which highlight the location of vulnerable individuals or financial details? Team members need to know where to go and what to do. Policies in place to assist in handling data only serve to reduce the possibility of losses. Big Data security policies must notify the team of what is required during Business as Usual (BAU) so as to reduce the value and impact of such losses.