Data for Democracy Project

Monitoring Elections

Data for Democracy Project

Directed by Dr. Waël Hassan, the Data for Democracy Project aims to identify and recommend strategies, tools, and technology to protect democratic processes and systems from social media misinformation and disinformation. By creating a technology solution that permits observers to monitor election interference, we offer a unique and bipartisan approach to election monitoring. Built by experts in technology, and leaders in the cyber and national security world, DDP intends to offer concrete solutions to an urgent problem.

See more details on the newly established project page.


Moving from Access Control to Use Control

in an AI world you can’t talk about Access control but Use Control Rather


Is AI Compatible with Privacy Principles?

Bringing Privacy Regulation into an AI World, Part 2

This seven-part series explores, from a Canadian perspective, options for effective privacy regulation in an AI context.

Many experts on privacy and artificial intelligence (AI) have questioned whether AI technologies such as machine learning, predictive analytics, and deep learning are compatible with basic privacy principles. It is not difficult to see why; while privacy is primarily concerned with restricting the collection, use, retention and sharing of personal information, AI is all about linking and analyzing massive volumes of data in order to discover new information.

“AI presents fundamental challenges to all foundational privacy principles as formulated in PIPEDA [Canada’s Personal Information Protection and Electronic Documents Act].”

Office of the Privacy Commissioner of Canada

The Office of the Privacy Commissioner (OPC) of Canada recently stated that, “AI presents fundamental challenges to all foundational privacy principles as formulated in PIPEDA [Canada’s Personal Information Protection and Electronic Documents Act].”[1] The OPC notes that AI systems require large amounts of data to train and test algorithms, and that this conflicts with the principle of limiting collection of personal data. [2] In addition, organizations that use AI often do not know ahead of time how they will use data or what insights they will find.[3] This certainly appears to contradict the PIPEDA principles of identifying the purposes of data collection in advance (purpose specification), and collecting, using, retaining, and sharing data only for these purposes (data minimization).[4]

So, is it realistic to expect that AI systems respect the privacy principles of purpose specification and data minimization?

I will begin by stating clearly that I believe that people have the right to control their personal data. To abandon the principles of purpose specification and data minimization would be to allow organizations to collect, use, and share personal data for their own purposes, without individuals’ informed consent. These principles are at the core of any definition of privacy, and must be protected. Doing so in an AI context, however, will require creative new approaches to data governance.

I have two suggestions towards implementing purpose specification and data minimization in an AI context:

  1. Require internal and third-party auditing

Data minimization – the restriction of data collection, use, retention and disclosure to specified purposes – can be enforced by adding to legal requirements regular internal auditing and third-party auditability.

As currently formulated, the Ten Fair Information Principles upon which PIPEDA is based do not specifically include auditing and auditability. The first principle, Accountability, should be amended to include requirements for auditing and auditability. Any company utilizing AI technologies – machine learning, predictive analytics, and deep learning – should be required to perform technical audits to ensure that all data collection, retention, use, and disclosure complies with privacy principles. AI systems should be designed in such a way that third party auditors can perform white box assessments to verify compliance.

2. Tie accountability to purpose of collection

The core of the concept of data minimization is that personal data should only be collected for purposes specified at the time of collection, to which data subjects have given consent. While in AI contexts, data is increasingly unstructured and more likely to be used and shared for multiple purposes, data use and disclosure can still be limited to specified purposes. Data minimization can be enforced by implementing purpose-based systems that link data to specific purposes and capture event sequences – that is, the internal uses of the data in question.

To that end, I suggest the following:

i) Canadian privacy law very clearly states that the collection, retention, use, and disclosure of personal data must be for a specified purpose. As I mentioned above, the fair information principle of accountability should be revised to require audits that demonstrate that all collection, use, retention and disclosure is tied to a specified purpose, and otherwise complies with all other fair information principles.

ii) Organizations should be required to prove and document that the sequences of events involved in data processing are tied to a specified purpose.

To continue with the example from my previous post on legislating AI:

The robotics club of which Alex is a member announces it has a partnership with Aeroplan. Under current regulations, notifying members of this data sharing partnership is sufficient, as long as the club points to Aeroplan’s privacy policy. However, given the advanced capacities of AI-enhanced data processing, the company should spell out which specific data processing activities will be applied to the data.

For example, the club’s privacy policy could include the following:

“As part of our partnership with Aeroplan, we may share the data we collect on you with Aeroplan, including your demographic data (your age and address, for example), and the frequency of your visits to our various club locations.

Aeroplan will provide us with information about you, including your income class metrics (your approximate gross earnings per year, and the band of your gross annual earnings) and information regarding your online activities and affinities; for example, your preferred gas station brand and favourite online stores, combined with the volume of your purchases.”

This notification provides a much clearer explanation of the purpose of the club’s partnership with Aeroplan than is currently standard in privacy policy text. It informs clients about data collection and sharing practices, as is required, but also describes the types of personal information that are being inferred using data analytics. With this information, clients are in a much better position to decide whether they are comfortable sharing personal data with organizations that will use it for targeted marketing.

AI will require new approaches to enforcing the data protection principles of data minimization and purpose specification. While AI systems have the capacity greatly to increase the scope of data collection, use, retention and sharing, they also have the capacity to track the purposes of these data processing activities. Maintaining the link between data and specified purposes is the key to enforcing privacy principles in a big data environment.


[1] Office of the Privacy Commissioner of Canada, Consultation on the OPC’s Proposals for ensuring appropriate regulation of artificial intelligence, 2020.

[2] Centre for Information Policy Leadership, First Report: Artificial Intelligence and Data Protection in Tension, Oct 2018, pg. 12-13. The Office of the Victorian Information Commissioner, Artificial Intelligence and Privacy, 2018.

[3] The Office of the Victorian Information Commissioner, Artificial Intelligence and Privacy, 2018. See blog post from lawyer, Doug Garnett, AI & Big Data Question: What happened to the distinction between primary and secondary research? Mar 22 2019.

[4] The Office of the Victorian Information Commissioner, Artificial Intelligence and Privacy, 2018.


Categories: BlockChain

Microstrategy Bit Coin Conference

This document is designed to serve as a resource to help you navigate a corporate bitcoin strategy. It draws from MicroStrategy’s experience with its BTC initiative, but it is not the actual project roadmap used by MicroStrategy.


Access Control in a Big Data Context, IV

ACCESS CONTROL IN BIG DATA

Bringing Privacy Regulation into an AI World, Part 4: Access Control in a Big Data Context

This seven-part series explores, from a Canadian perspective, options for effective privacy regulation in an AI context.

Access control, the ability to control access to information, has long been the primary tool of online security. The great advantage of access control is that it tells you who has access to what information at any given time.  Its limitation is that it cannot tell you what the user does to the data, and in what context.  Once write access or editorial access is given, an authorized user can in theory do anything with the data – it can be rewritten, deleted, or copied and passed on. Simply put, access control is only effective as long as data doesn’t leave its source realm, or its initiator’s sphere of influence.

As AI and data use continue to evolve, the access control model is becoming outmoded.  Data moves far beyond its origination point, often passing rapidly from collector to purchaser to further purchasers in a chain of unregulated data trading.  Data brokers buy web and social media traffic, combining them with data such as customer support, public records, and phone and Internet metadata to create highly detailed profiles of hundreds of millions of consumers. AI allows a data scientist to make inferences about consumer preferences, and sort consumers into multiple categories, passing these on to retailers for targeted advertising.

There’s a world of difference between sharing your data with your favourite store and consenting to its unrestricted use. Yet these are the only options offered by access control.

Individuals usually grant access to their information so that a business may provide services.  Often the customer cannot access these services without opting in to information sharing. If you give your email to a clothing store, you can expect to receive advertising related to the products they sell. But your information rapidly travels far beyond the company you made your initial agreement with. There’s a world of difference between sharing your data with your favourite store and consenting to its unrestricted use. Yet these are the only options offered by access control. Until we mature that model, data protection will be all or nothing. If you don’t want to share your data, don’t sign up to join Amazon or Google – if you do choose to use these services, your personal data may be used and shared for nearly any purpose that serves them.

By using access control to regulate AI and big data, we’re trying to solve today’s challenges with yesterday’s tools. Access control only addresses the problem of unauthorized access. We need different tools to address the problem of unauthorized use of data. In my next post I will explore options for better control of the use of data in an AI context.


Big Data’s Big Privacy Leak – Metadata and Data Lakes, Part 3

Bringing Privacy Regulation into an AI World, Part 3: Big Data’s Big Privacy Leak: Metadata and Data Lakes

This seven-part series explores, from a Canadian perspective, options for effective privacy regulation in an AI context.

For a long time, access control has been the principal means of protecting personal information. Fifty years ago, this meant locked file cabinets. Today, much of our personal data is protected by passwords. This system, refined over the past fifty years, has been highly effective in securing data and minimizing data breaches. But the advent of big data and AI has moved the goalposts. Access control cannot protect all of the personal data we reveal as we navigate the internet. Further, most internet users are now more concerned about how the companies to which they entrust their personal information are using it, than about the risk of data theft. To adapt to this rapidly evolving digital environment, it will be necessary to rethink access control and develop stronger practices for controlling the use of personal data.

To adapt to this rapidly evolving digital environment, it will be necessary to rethink access control and develop stronger practices for controlling the use of personal data.

Many of our daily online activities are regulated by passwords. They safeguard our online lives, giving us access to everything from our smartphones and bank accounts to the many websites where we shop and entertain ourselves. Passwords are keys securing our personal information and property. The security they give us is known as access control.

Yet there is one type of personal data that passwords cannot protect: the traces we leave every time we use the Internet and phone networks.  The details of our activity as network users are known as metadata. We can keep our personal information under cyber-lock and key, but not our metadata. We can erase browser cookies, but the search engine’s log of our browsing patterns and search keywords remains.

The ground rules of personal data protection have not changed, despite general confusion about how they apply in rapidly-changing contexts. Fair information principles, the bedrock of Canadian privacy legislation, state that organizations should only collect, use, share, and retain personal information for specific purposes to which individuals have consented. Any information, or combination of information, that is detailed enough to potentially identify a person is considered personal information, and these rules apply.

Yet as larger and larger volumes of data are collected and aggregated by big data initiatives, it becomes more and more difficult to define precisely what is considered personal information. “Data lakes” – massive repositories of relatively unstructured data collected from one or several sources, often without a specific purpose in mind – are a highly valuable asset for companies, providing a wide variety of data for potential future analysis, or for sale to other companies.

Data lakes often contain a mix of metadata and personal content. In combination, these can frequently identify specific individuals. For example, publicly available and searchable databases of Twitter activity show tweets by geographic location – positions so specific as to reveal street addresses. In the commercial realm, big box retailers use customers’ debit and credit card numbers to link their various purchases, and have developed customer sales algorithms so refined that they can identify the purchase patterns of pregnant women and send them coupons for baby products. Personally-identifiable data is of far more value to marketers than aggregate data, and powerful AI technologies can be harnessed to re-identify anonymous data.

Legally, personal data can only be collected and used for specific purposes to which individuals have given consent. AI systems, however, blur the line between anonymous data and personal information by making it possible to identify individuals and infer more detailed personal information by combining data from multiple sources. Controlling access to data does not address the most significant privacy risks of AI initiatives. To protect privacy in a big data world, it will be necessary to develop more sophisticated strategies to govern the use and sharing of personal data, as I will explore in my next posts.


Series: Bringing Privacy Regulation into an AI World

Bringing Privacy Regulation into an AI World

Over the past decade, privacy has become an increasing concern for the public as data analytics have expanded exponentially in scope. Big data has become a part of our everyday lives in ways that most people are not fully aware of and don’t understand. Governments are struggling to keep up with the pace of innovation and figure out how to regulate a big data sector that supersedes national borders.

Governments are struggling to keep up with the pace of innovation and figure out how to regulate a big data sector that supersedes national borders.

Different jurisdictions have taken different approaches to privacy regulation in the new context of big data, machine learning, and artificial intelligence (AI). The European Union is in the lead, having updated its privacy legislation, established a “digital single market” across Europe, and resourced a strong enforcement system. In the United States, privacy remains governed by a patchwork of federal and state legislation, largely sector-specific and often referencing outdated technologies. The US Federal Trade Commission is powerful and assertive in punishing corporations that fail to protect data from theft, but has rarely attempted to regulate the big data market. Canada’s principle-based privacy legislation remains relevant, but the Office of the Privacy Commissioner (OPC) acknowledged recently that “PIPEDA [the Personal Information Protection and Electronic Documents Act] falls short in its application to AI systems.”[1] As the OPC states, AI creates new privacy risks with serious human rights implications, including automated bias and discrimination[2]. Given the pace of technological innovation, there may not be much time left to establish a “human-centered approach to AI.”[3]

This series will explore, from a Canadian perspective, options for effective privacy regulation in an AI context. I will discuss the following topics: 

  1. Do we need to legislate AI? 
  1. Are privacy principles compatible with AI? 
  1. Big data’s big privacy leak – metadata and data lakes 
  1. Access control in a big data context 
  1. Moving from access control to use control 
  1. Implementing use control – the next generation of data protection 
  1. Why Canadian privacy enforcement needs teeth 

[1] Office of the Privacy Commissioner of Canada, Consultation on the OPC’s Proposals for ensuring appropriate regulation of artificial intelligence, 2020.

[2] Ibid.

[3] G20 Ministerial Statement on Trade and Digital Economy, 2019.


Do We Need to Legislate AI?

Bringing Privacy Regulation into an AI World, Part 1

This seven-part series explores, from a Canadian perspective, options for effective privacy regulation in an AI context.

In recent years, artificial intelligence (AI) and big data have subtly changed many aspects of our daily lives in ways that we may not yet fully understand. As the Office of the Privacy Commissioner (OPC) of Canada states, “AI has great potential in improving public and private services, and has helped spur new advances in the medical and energy sectors among others.”[1] The same source, however, notes that AI has created new privacy risks with serious human rights implications, including automated bias and discrimination. Most countries’ privacy legislation, including Canada’s, was not written with these technologies in mind. The privacy principles on which Canada’s privacy laws are based remain relevant, but are sometimes difficult to apply to a complex new situation. Given these difficulties, the OPC has questioned whether Canada should consider defining AI in law and creating specific rules to govern it.[2]

I would argue, in contrast, that the technological neutrality of Canada’s privacy legislation is the reason it has aged better than laws in other jurisdictions, notably the US, that reference specific technologies. The European Union’s recently updated and exemplary privacy legislation deliberately takes a principle-based approach rather than focusing on particular technologies. 

I thoroughly support the principles of technological neutrality, and I do not recommend the creation of specific rules for AI.

I thoroughly support the principles of technological neutrality, and I do not recommend the creation of specific rules for AI. Technologies are ephemeral and volatile, and they change rapidly over time; what the technological concept of “the cloud” meant ten years ago is very different from what exists today. AI is evolving all the time. Creating a legal definition of AI would make privacy legislation hard to draft, and harder to adjudicate. Doing so could easily turn any court case on privacy and AI into a fist-fight between expert witnesses offering competing interpretations. 

AI adds a new element to the classic data lifecycle of collection, use, retention and disclosure: the creation of new information through linking or inference. For privacy principles to be upheld, data created by AI processes needs to be subject to the same limitations of use, retention and disclosure as the raw personal data from which it was generated. It is important to note that, conceptually, AI is not a form of data processing; rather, it is a form of collection. AI’s importance in the privacy domain lies in its impact – which is that it expands on the data collected directly from individuals.  

An example:

Alex is the client of a robotics club. She provided her personal information on sign-up. The club, which has various locations in different cities, offers its patrons a mobile app to locate the nearest venue; Alex signed up to the app. The club’s AI analytics systems can track the stops that Alex makes en route to the club, and infer her preferred stopping places – the library, a tea shop, a gas station.  

The robotics club has a café, and wants to know what its patrons like, so they can serve it. The club’s data processing has ascertained that Alex stops frequently at a tea shop on the way to the club; it infers that she likes tea. People share ideas and book recommendations through the app, and the club also makes recommendations to patrons. Alex has recommended Pride and Prejudice; the club’s AI infers that she would also enjoy Jane Eyre, and recommends it to her. The club’s AI system also searches her public Facebook posts to analyze her interests and recommend other books and products she might like. 

AI systems go far beyond analyzing data that individuals have voluntarily provided. They frequently collect data indirectly, for example, by collecting public social media posts without individuals’ knowledge or consent. Through linking and inference, AI uses data from various sources to create new data, almost always without the consent of data subjects. This creation of knowledge is a form of data collection. If regulation can deal with privacy issues at the level of collection, it has also dealt with use, since collection is the gateway to use.  

Therefore, I recommend changing the legal definition of data collection to include the creation of data through linking or inference, as well as indirect collection. Under Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA), organizations may only collect personal data for purposes identified to individuals before or at the time of collection. Defining the creation of data as form of data collection would mean that information could be created through AI analytics only for specified purposes to which data subjects have consented. 

To summarize, I do not believe that it is advisable or necessary to create specific legislation to govern privacy in the context of AI systems. The creation of new information through data analytics can be governed effectively by the same principles that govern the direct collection of personal data. 


[1] Office of the Privacy Commissioner of Canada, Consultation on the OPC’s Proposals for ensuring appropriate regulation of artificial intelligence, 2020.

[2] Ibid.


Categories: Election

5 Questions About Elections and Disinformation

With a US election campaign ramping up, many people around the world are paying attention to the impact of misinformation and disinformation on democracy.

Electoral management bodies, including election officials at all levels of government, are looking to social media monitoring to battle election interference through disinformation campaigns. Three types of attacks are commonly seen during election years:

In my e-book, Using Social Media Data to Transform Election Monitoring, I discuss five questions of context that are key in tracing and combating misinformation and disinformation.

The who, what, when, where, and how are critical in detecting mis/disinformation

  1. Who: Individuals who spread misinformation/disinformation, and the connections between them
  2. What: What they’re saying (text, images, emojis, logos)
  3. When: Time relative to election process – time of day / weekdays
  4. Where: The platform(s) culprits use, and their geographical location
  5. Why: The context of individuals’ discourse: other conversations they are involved in, and other topics they discuss

To learn more about election social media monitoring reach out to Dr. Waël Hassan, CEO and Chief Data Scientist of KI Design. You can also grab a copy of my e-book, Using Social Media Data to Transform Election Monitoring, available on @amazon @kindle today.


Categories: Privacy

A 3D Test for Evaluating COVID Alert: Canada’s Official Coronavirus App

Great news – Canada has just released its free COVID-19 exposure notification app [1], COVID Alert. Several questions now arise: Is it private and secure? Will it be widely adopted? And how effective will it be at slowing the spread of the virus? We have evaluated the COVID alert app against three dimensions: Concept, Implementation, and User Experience. We grade the concept as leading-edge (A+), the implementation, just adequate (C), and the user experience less than satisfactory (D).

Ontario Digital Service (ODS) and Canadian Digital Service (CDS) built the app based on a reference implementation by Shopify, with CDS taking operational responsibility and ownership. The security architecture was reviewed by Blackberry and Cylance. Health Canada performed an Application Privacy Assessment[2], which was reviewed by the Office of Privacy Commissioner of Canada[3] and the Information Privacy Commissioner of Ontario.

HOW COVID ALERT WORKS

  1. Via Bluetooth, the app remembers to which phones it has come in close physical proximity.
  2. When a person contracts COVID-19, she or he can submit code into the app declaring their status.
  3. The app will check daily to see if anyone you’ve been near has reported testing positive.
  4. If you’ve been near an infected person in the past 2 weeks, you’ll get a notification.

At present, there isn’t enough data to provide a proper assessment of COVID Alert

However, I can offer my thoughts on the three aspects of design mentioned above:

CONCEPT

Canada got it right – a successful COVID-19-related app that focuses primarily on its benefit to users, i.e. notification, rather than tracking. A tracking app needs to track everyone’s routes and interactions all the time; this captures way too much private data, making it a tempting treasure-trove to hackers. Privacy concerns will impede adoption of tracking apps.

COVID Alert side-steps these concerns by focusing only on notification. All other countries that have developed an app have built a tracking device to be installed on a cell phone, and included a notification feature. Canada, on the other hand, has built a notification app. The fact that its use is voluntary will further boost public confidence.

Grade for concept: A+

IMPLEMENTATION

Apps may be built for the public, for healthcare providers, or for business use. Canada has chosen to build an app for the public. For apps created for the business or healthcare sectors, adoption is a given. The main challenge for a public app is: Will the public adopt it? It will need to reach a critical mass of adoptees to be successful. Without that critical mass, the app will provide little to no benefit.

COVID Alert’s server and app are both open source. This is an encouraging decision, as it makes it business-friendly, and improves public trust through expert scrutiny of the code.

The choice to focus on adoption by individuals is a strong point for privacy, but a challenge to effective implementation. In contrast, an app designed for business, aimed at detecting outbreaks connected to particular business locations, might raise more complex privacy issues, but could be implemented much more widely with support from the private sector.

The Canadian government had the option of implementing a COVID-19 data network between citizens, businesses, and public health. This app, unfortunately, only covers the individual, with a manual link to public health. How could this have been improved? A data exchange platform would have been a wiser choice, as it would help boost business adoption.

Grade for implementation: C

USER EXPERIENCE

While I’m not an expert, I’d say that the app user experience is marked by three things:

Grade for usability: D

Takeaway and Next Steps

The COVID Alert app is a positive and important concept; from a conceptual standpoint, Canada is ahead of all other solutions to date. Ideally, its implementation would go beyond the boundaries of an app. The current approach creates a basis for expansion. I intend to fully leverage the federal app by building an end-to-end solution, IoPlus, that focuses on business adoption.

References:

[1] https://www.canada.ca/en/public-health/services/diseases/coronavirus-disease-covid-19/covid-alert.html

[2] https://www.canada.ca/en/public-health/services/diseases/coronavirus-disease-covid-19/covid-alert/privacy-policy/assessment.html

[3] https://nationalpost.com/news/canada/hackers-target-canadians-with-fake-covid-19-contact-tracing-app-disguised-as-official-government-software

To read more about Wael’s outbreak notification design, follow this link. To learn about enterprise corporate compliance feel free to download Privacy in Design: a Practical Guide for Corporate Compliance from the Kindle Store.