Big Privacy for Big Data

Authorities working in data center room hosting server computer, Provide information services for business, isometric concept vector illustration

Managing the Risks of Data Analytics

Big data analytics has tremendous potential. With it, you can weave together massive amounts of disparate information and discover unforeseen patterns that enable you to optimize marketing, distribution, staffing, resource allocation, and virtually any other aspect of your operations. This is both the source of its value, and of considerable privacy risk. Big data initiatives often collect, use and share highly detailed personal information without individuals’ awareness or understanding. Even when consent is sought, data is frequently used and linked in unanticipated ways to which individuals have not agreed. ​

While privacy laws restrict personal data collection practices, big data initiatives are collecting, aggregating, and sharing larger and larger volumes of data. Legal regulation has yet to really catch up. Many of the big players on the Internet – Amazon, Google, Yahoo!, Microsoft, Twitter – could now be classified as data brokers, collecting personal data and then selling or sharing it for value. Public sector agencies, and healthcare and financial institutions, also conduct business with information brokers, and data is aggregated and shared within these sectors. ​

To avoid a breach, and remain compliant with privacy legislation, if your organization is using big data, then it also needs big privacy. A strong privacy framework, with effective training, tools, and implementation processes, is mandatory to govern big data initiatives. ​

Big Data and Privacy Regulation​

The ground rules of personal data protection are essentially no different in a big data environment than in more conventional contexts; Canadian organizations should only collect, store, use, and share personal information for specific purposes to which an individual has consented. ​

The Office of the Privacy Commissioner of Canada has ruled that information collected through online tracking generally constitutes personal information, and that under the Personal Information Protection and Electronic Documents Act (PIPEDA), website owners have a legal responsibility to protect the personal information of the people using their website. Further, they are responsible to obtain meaningful consent from individuals to the collection and use of their information. Meaningful consent can use either an opt-in or opt-out model. The key is that individuals are notified of the purposes of data collection and all of the parties who will have access to the data, as well as other information about privacy and security practices. Website owners should avoid collecting sensitive information, such as medical information, and destroy or de-identify data as soon as possible.​

The Privacy Commissioner’s 2011 Guidelines on Privacy and Online Behavioural Advertising are a must-read for companies that use web analytics.​

Principles of Big Privacy

In addition to technological solutions, big privacy for a big data environment means a mature privacy program with effective data governance practices. Four principles of big privacy are: ​

1. Data lifecycle management

The protective measures set out in your privacy policies must be maintained throughout the information life cycle. Your organization must make sure that all security and privacy requirements that protect datasets are tracked and maintained throughout the information life cycle, from data collection through use, retention, disclosure, and destruction. Individuals should be notified of these practices at the time of collection.​

Data management can be supported by software solutions that track data flows. This begins with effective access management. Auditing and monitoring are essential; audit predictive analytics can automatically track whether records have been accessed by unauthorized personnel. It is important to track not only flows of raw data but also of the inferences made by your data analysis programs, to ensure that all data is being collected, retained, used, and disclosed for a permitted purpose.   ​

2. Anonymization of Secondary Use Data 

Where PI is to be used for secondary purposes, including external (and sometimes internal) sharing, it must be anonymized or de-identified to protect individual privacy. Unfortunately, data analytics can reverse many older de-identification methods by re-linking correlated data. If data is not completely anonymized, the possibility and risk of linking different data sets must be evaluated. The best practice approach is to use newer database anonymization programs such as Aircloak, which place a personal information filter between the analyst and data, making it possible to query a dataset without viewing granular details. This method of anonymization enables real-time high-quality data analytics with little privacy risk. ​

3. Evaluation of Data Recipients

If your organization shares individual-level data, you’re obligated under PIPEDA to ensure that data recipients’ security and data privacy policies provide “a comparable level of protection” to your own. Be sure to carefully evaluate the privacy and security practices of third parties requesting access to data. Even if the data is de-identified, you will need to evaluate the risk that linking data sets from several sources could re-identify individuals. Ensure that any companies with whom you sell or share data can demonstrate that their own privacy practices are in full compliance with PIPEDA, and that data in their care is protected in all stages of its life cycle. ​

4. Legislative Compliance 

Your organization must identify and understand all the privacy regulations that apply to the data you store, process, and transmit. Make sure that you are aware of the exact definitions of terms such as “personal information,” “personal data,” and “identifying information,” “permissible purposes,” and “permissible disclosures,” in any jurisdictions relevant to your organization. ​

For any new (or redesigned) project, I strongly recommend conducting a Privacy Impact Assessment (PIA), to identify and address potential privacy risks, and ensure that your personal information management practices are aligned with PIPEDA. A PIA acts as an early warning system, showing you where you need to build safeguards into your data management practices. ​

Legal compliance mapping tools, or data protection management software, can also help to assess your privacy practices, by creating a detailed mapping of data flows and compliance obligations.​


Big data analysis can be highly advantageous for your company, but with this power comes the responsibility of protecting the PI of your data subjects.  Meeting big privacy standards means re-examining your legal and policy obligations, risk management practices, and the processes, tools, and governance of information management in light of the greater interconnection between data sets, individuals, and organizations. ​

This is an obligation, but also an opportunity. A significant majority of Internet users want more control over their personal data. A strong privacy framework, that gives individuals a sense of agency and safety, can establish (or enhance) your reputation as a privacy leader – a smart business move.​


This article is based on my book, Privacy In Design: A Practical Guide to Corporate Compliance.