E H T S ’ R E K I H H C T I O H T E D I GU ACY V I R P N G I S E D Y B Barbara
Peruskovic
About THE AUTHOR Barbara Peruskovic has been working in the field of information architecture for over 30 years. At the age of 12 she learned the principles of programming ( read – did not have a computer so wrote her code on paper), at the age of 14 she wrote her first application ( read – high school assignment to track ethnicity of her fellow students). As this last one was part of a bigger scale ‘mistake’ ( read – massive war on local scale somewhere in Europe), she counted her blessings and continued her education in Western Europe. Not used to peace and quietness, after rushing through high school and a freshman year of Mathematics faculty, she decided to try her luck tacking Y2 problems in the corporate world (despite all the odds this did not lead to any major galactic catastrophe). By total accident ( read – IQ measurements by inadequate recruitment agency ) she ended up at the Information Management department of a great hidden software company. After learning the craft ( read – being called youngster for 10 years), she started her own consulting service providing advisory services to help organisations form their own solutions, visions and teams while facing data challenges. With years of hands-on experience with different tools, techniques and organisation models, she learned to favour the approach of practical wisdom. Meaning that one can learn the principles of action, but applying them in the real world, in situations one could not have foreseen, requires experience of doing ( read – she made many mistake and now can warn/prevent others from making the same).
Being diagnosed as Philomath ( read – officially addicted to learning) she followed courses in Applied Psychology, several technologies ( read – you know, the big 4’s ) and by total accident Ethics and Privacy. Combined with her love for mathematical models, data and engineering, this led to an true obsession. Nowadays, she is teaching Privacy by Design in the area of Data & Analytics and consulting in Ethical Data Governance ( read – she can finally choose assignments and customers). All wisdom in her work and this paper is to be accredited to great contributors from her professional network and the academic society. She is just the implementer of privacy and governance principles in an exciting but troubled world of data. (P.S. Her only truly unique accomplishment is a vertical catwalk of some public building in Amsterdam ( read climbing 60m head down attached to a small rope – which in some tiny, hidden part of the universe is considered to be a profession).
contents
2 ABOUT THE HITCHHIKER’S GUIDE 3 FOREWORD 4 THE GREAT QUESTION 5 THE VALUE OF PERSONAL DATA 6 DATA ECONOMY IN THE REGULATORY EU UNIVERSE 11 DESIGNING THE PRINCIPLES OF TRUST 12 Engineering ethics into data 14 Privacy by Design (PbD) 16 Terminology of privacy 20 Privacy risk management framework 26 Privacy Enhancing Technologies (PETs) 30 FRAMEWORKS FOR IMPLEMENTING PRIVACY 38 VALUE INCREASING EFFECT OF PRIVACY 42 Conclusion 43 THE RESTAURANT AT THE END OF THE UNIVERSE 45 ABOUT PROTEGRITY THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 1
“In many of the more relaxed civilizations on the Outer Eastern Rim of the Galaxy, the HitchHiker’s Guide has already supplanted the great Encyclopaedia Galactica as the standard repository of all knowledge and wisdom, for though it has many omissions and contains much that is apocryphal, or at least wildly inaccurate, it scores over the older, more pedestrian work in two important respects. First, it is slightly cheaper; and secondly it has the words DON’T PANIC inscribed in large friendly letters on its cover.” Douglas Adams, The Hitchhiker’s Guide to the Galaxy 2 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
FOREWORD By Ann Cavoukian, Ph.D. The Hitchhiker’s Guide to Privacy by Design is by far one of the most creative and delightful texts to read! I have always linked the importance of privacy to enabling innovation and creativity regarding the ways in which we can make technology work for us – Protegrity is showing us that effectively transferring this knowledge benefits greatly from taking a design approach. I applaud Barbara Peruskovic, the author of this guide, for her ingenuity and dedication to teaching in the areas of Privacy by Design, and data analytics. For anyone overwhelmed by the magnitude of the GDPR, this is indeed a ‘must read’. What could be a better stage on which to demystify this complex new approach to privacy rights than the entire universe – a space galaxy! The saying ‘a picture is worth a thousand words’ rings true with the Hitchhiker’s Guide to Privacy by Design – it uses colourful images that convey the meaning and essence of Privacy by Design, elements of the GDPR, and privacy ethics more effectively than a lengthy 10,000+ description could ever achieve. Bravo Barbara, Protegrity and the entire team! Remember that privacy forms the foundation of our freedom – you cannot have free and democratic societies without privacy. By embedding it into the design of our operations, we can be assured that privacy will be preserved – now, and well into the future. Ann Cavoukian, Ph.D., LL.D. (Hon.), M.S.M., is a distinguished Expert-in-Residence at the Privacy by Design Centre of Excellence, Ryerson University
T ’ N DO IC N A P
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 3
THE GREAT QUESTION “The Answer to the Great Question… Of Life, the Universe and Everything… Is… Forty-two,’ said Deep Thought, with infinite majesty and calm.” Douglas Adams, The Hitchhiker’s Guide to the Galaxy
Privacy has always been context related, but our data hungry society has added various extra dimensions to it. The perception of it today is not only considered an ethical question but it also poses various economical, legal and technical questions. And the main answer to it all – required by society and law – is Privacy by Design. Unfortunately, it is very hard to find out how to achieve it. The aim of this Guide is to provide a repository of the knowledge and wisdom of many brave scientists, researchers and practitioners who have reached for it.
4 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
42
THE VALUE OF PERSONAL DATA “The History of every major Galactic Civilisation tends to pass through three distinct and recognisable phases, those of Survival, Inquiry and Sophistication, otherwise known as the How, Why, and Where phases.” Douglas Adams, The Restaurant at the End of the Universe
Technological development is not only increasing our ability to store and exploit data, but is also nudging us to share tremendous amount of personal information. In a current economy that classifies data as an asset, a commodity and even a currency, personal data has a huge value potential. A study from BGC (ref 1) states that value generated by application build on personal data “can deliver a €330 billion annual economic benefit for organisations in Europe by 2020”. This growth is under pressure as consumers worry about violation of their privacy – caused by data breaches as well as by misuse of their personal data. The challenge is to establish an environment in which a fundamental human right to privacy and safety is also applied in context of data – data privacy protection – while the economic value is maintained. THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 5
DATA ECONOMY IN THE REGULATORY EU UNIVERSE “In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move.” Douglas Adams, The Restaurant at the End of the Universe
Being primarily an economic union, the EU acknowledged the fact that the opportunities related to a digital identity market are crucial for the region to compete on a global level in the data driven economy. In order to do so the EU must “make sure that the relevant legal framework and the policies, such as on interoperability, data protection, security and IPR are data-friendly, leading to more regulatory certainty for business and creating consumer trust in data technologies” (ref 2) As an early adopter of embedding economical aspects at a political level, the EU is setting up an extended data regulatory framework. With this small universe of directives and regulations, the EU is pioneering on the frontiers of the technology based legislation.
6 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
yri
on lati egu er R 004 erg 2 • M 139/ No
)
(EC
MP
ETI
TIO
N
101 TFEU nsing • Article ice r tion on L • Regula nts for the transfe agreeme logy (EU) of techno 14 0 No 316/2
ector Public s
Avi ati on
• Article 15 of the TFEU • Re-Use Direc 2003/98/E tive C
Ty
• In foS oc D irec tive 200 1/2 9/E C
C
CO
Fo od
ve 96/9/E
t
ine ctr U U do JE FE ies e C 2 T ilit th 10 ac of le l f w tic tia -la Ar en e • Ess cas • and
d an er ions rg Me uisit acq
kings
• A dv Info ancer 20 rmat Pass • P 04/82 ion Di enger ass rec /E tive Dire enge C ctiv rs N e (E ame U) 2 R 016 ecord /68 1
se Directi
gh
ce ies an lit in i m fac Do nd a
Au
• Databa
PROP ER
ot tom
)
y iv ac
Cop
ive
EU
ive
• e20 Priv 02 ac /5 y D 8/ ir EC ec t
• P rop on osal con for of d a t igit racts Direc al c for tive t ont ent he su ppl y Pr E-
Databas e rights
Energy
INTELLECTUAL
tive (EU)
ec ecrets Dir • Trade s 2016/943
Trade secrets
Chemica
• Fo Co od No nsu Info 11 me rma 69 rs tio /2 Re n 01 gu to 1 lat ion (
TS
D ATA S H A R I N G O B L I G A TIO NS
GH n
y vac
Underta
07 /20 on issi o 715 m E e )N cles on (EC irectiv ehi • V gulati lling D Re Labe C /E ar • C 99/94 19 s
A
Pri
ls
rnal /72/EC ve for Inte • Directi in Electricity 2009 t in t arke M al Marke rn te ve for In • Directi Gas 2009/73/EC 10/30/EU Natural Directive 20 Labelling • Energy ficiency Directive Ef • Energy EU 2012/27/
DU
tio tec Pro ) a t Da DPR ral (G ene ation G l • u g Re
t
Pha rm ace uti cal
IVI
RI
en nm
e
IND
L
Transp ort
Spatial
l cia an s Fin vice Ser
ro vi
En
ctiv
• REACH Regulatio n (EC) No 1907/2 006
007/2/EC irective 2
014 6/2 C 128 138/E / No 9 on 200 U l a ti tive 4/65/E 2014 egu irec sR 1 0/ RIIP cy II D ive 20 No 60 n ) ct (EU olve D i re
Dire
Con rig sum hts er
D • INSPIRE
• P
ion • S FID II ts ulat i uc Reg • M d iFIR ro P C • M ion on /E at cti 3/4 m ote 00 to or Pr 2 ss Inf C nt ive ce tal 4/E t Pla c Ac n 3/ • Dire lic nme 200 b Pu iro e • Env ectiv r Di
• M ed 200 icinal P 1/8 3/E roduc C ts
• Intellige nt Transp ort System (ITS) Dire s ctive 201 0/40/EU
A snapshot of THE EU data legislation universe
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 7
But for now, let’s ignore the universe and focus on the comet that is going to hit us first – GDPR. The General Data Protection Regulation entered into our orbit on May 25th, 2016 and its exact date of impact is set on May 25th, 2018. GDPR is actually not a single comet – it is more like an asteroid belt consisting of 99 interlinking articles. To give us a simple overview CNIL provided the beautiful picture opposite of it in orbit (as seen from France) (ref 3). The GDPR has come to us to deal with the protection of natural persons with regard to the processing of personal data and on the free movement of such data. Instead of giving the details of all the articles, the essence of GDPR boils down to:
Broader definitions of personal data ‘Personal data’ means any information relating to an identified or identifiable natural person. This not only includes PII (personally identifiable information) like name, address, birth date or Social Security number, but also a person’s digital location data, an online identifier, or one or more data element specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.
Increased territorial scope The GDPR applies to businesses established in the EU, as well as to businesses based outside the EU that offer goods and services to, or that monitor, individuals in the EU.
Consent The consent of a data subject remains one of the legal grounds for the processing of personal data under the GDPR, with strict requirements. 8 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
Right of individuals and transparency GDPR requires that a person, whose data has been processed, be sufficiently informed to ensure fair and transparent data processing. The information must be provided to the data subject in a concise, transparent, intelligible and easily accessible form. Part of the expanded rights outlined by the GDPR is the right for individuals to obtain confirmation as to whether or not personal data concerning them is being processed, where and for what purpose. The individual has the right to rectification of the data without undue delay if an error has been found. The individual will have the right to request erasure of his/her personal data on several grounds (and within a short time period!).
Breach notification Under the GDPR, breach notification will become mandatory in all member states where a data breach is likely to “result in a risk for the rights and freedoms of individuals”. This must be done within 72 hours of first having become aware of the breach. The organisations will also be required to notify their customers without undue delay.
A SIMPLE OVERVIEW OF GDPR – AS SEEN BY CNIL
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 9
Data Protection Officers Under specific conditions, the GDPR requires you to appoint a Data Protection Officer or DPO. This is the case if your organisation is a public authority, if you perform larger scale data processing with systematic monitoring of individuals or if you process special categories of personal data like heath or criminal records.
One-stop-shop Under the one-stop-shop mechanism, organisations in multiple EU countries are assigned to one lead supervisory Data Protection Authority (DPA) in the location of the organisation’s main establishment.
Monetary impact on non-compliance The GDPR establishes a tiered approach to penalties for breach which enables the DPAs to impose fines up to 4% of annual worldwide turnover and EUR20 million.
Data protection by design and default While all other points refer to operational and/or procedural implication, this means implementing appropriate technical and organisational measures to safeguard personal data, including limiting access to it, storing it in a pseudonymised format and ensuring data is only used and retained as long as necessary for the purpose for which it was obtained. And also doing it by the principle of Privacy by Design – deciding upfront what personal data you need and how you are going to process it, including how you are going to collect it, store it, share it and dispose of it. The sole purpose of this new data-centric universe is not to spell out how to behave but to engender trust in the digital identity economy which depends on consumers’ trust. While from legal and organisational angles this may have been done many times before, now organisations need to address how to do this in quite technical data environments like Big Data, analytics and Artificial Intelligence. 10 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
DESIGNING THE PRINCIPLES OF TRUST “A common mistake that people make when trying to design something completely fool proof is to underestimate the ingenuity of complete fools.” Douglas Adams, Mostly Harmless
There are many good explanations about the organisational and legal aspects of GDPR and other legislations. There is a lot of information about which procedures you have to implement to obtain consent, to notify about a data breach, to appoint an DPO, to organise a privacy assessment and audit… All references are mentioned in the The Restaurant at the End of the Universe (appendix, page 43). But when it comes to Data Protection by Design (Privacy by Design and Privacy by Default), there seems to be a distinct lack of clear answers. And by “clear answers”, we mean the guidance and instructions that the people who design, develop and implement the software systems actually understand. Let alone that there is any (except academic) literature about implementing Privacy by Design within (big) data and analytics environments – the technologies that dominate and determine the world
we are living in. The systems and software developers, the engineers, the data architects and the data scientists need the answer for “How”. And most of all, they have to be made aware that they have to ask this question. Because we all think that everybody in the software industry should know about privacy this does not mean this is actually the case. Recent empirical work by Irit and Hadar (ref 4) shows that almost 50% of software engineers have no knowledge of privacy law(s) and only 14% can actually reference the law. The conclusion is that the engineering mindset often limits privacy and measures taken are mostly related to security instead of actions to ensure rightful processing. Of course, a lot of this is caused by the fact that commercial culture in most companies largely ignored privacy principles up until now. And even though the discipline of privacy engineering has existed for almost two decades, only recently is it being taught at (still only a few) universities. But the most disturbing fact that this research reveals is the considerable evidence that software engineers are avoiding taking the responsibility for privacy. It seems like engineers do not understand that their decisions “can unleash technology upon the world that can significantly affect fundamental rights” (ref 5).
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 11
Engineering ethics into data “I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.” Douglas Adams, The Hitchhiker’s Guide to the Galaxy
So before even starting to talk “how” let’s talk about “why”. Most companies do have mandatory privacy awareness training or sessions but they mostly just explain the norms and the rules at such abstract level that an engineer cannot relate it to daily practice. In case of a millennial data scientist these kind of corporate rules may even conflict with their own informational and moral norms. A generation that has been logging onto social media platforms since they were children will, and must, have another view on the privacy aspects of sharing data. And what many engineers (old and young generation) have in common is that they truly believe that the data captured in their algorithms and systems is harmless. Often subjected to the sterile logics of mathematics and computing, they see it as non-informative until used, and detached from consequences in ‘real life’. But as professor Mireille Hildebrandt, FU Brusseles (ref 6), states “we need to start thinking of privacy as the protection of uncomputability of the self, defending what cannot be captured, but can be destroyed”.
12 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
So, let’s follow the logic of engineering to explain the relationship between the values, the norms and technology. We all know the paradigm of bad data quality that has been haunting information systems since their beginnings. If the data is incorrect so will be any decision based on it. Over time, we learned and developed different techniques to deal with these problems. They are not always solvable, but at least from a technical and organisational standpoint we all understand their impact.
Nowadays, technologies like big data, machine learning and advanced profiling analytics gives are still in their early adolescence. As such they are vigorous and refreshing, and bring opportunities in terms of convenience, living standards and innovation. But, according to Hildebrandt and Koops (ref 7) their scale and score are also creating new vulnerabilities and enlarging existing ones. The new types of vulnerability are: • Incorrect categorisation • Privacy and autonomy • Discrimination and stigmatisation • Lack of due process Incorrect categorisation (e.g. being labelled as having a 70% chance of developing cancer or causing a car accident) can have far-reaching consequence like being refused health or car insurance. The causes of incorrect categorisation may be various and difficult to detect – due to ‘errors’ inherent in computing techniques based upon stochastic inferences. The common identifiers of a group or past behaviour trend analysis may determine how an individual is treated even if other (unknown) identifiers exclude them from the group or trend. The so-called ‘autonomy trap’ (KARSKY, ref 8) can be caused when people don’t know that, based on profiling, they are offered a limited choice (e.g. offering a different policy assurance to male or female). So even if they thought they took a well-informed decision, the decision option is being controlled without their knowledge. Discrimination is about finding the difference that makes a difference. It may not always be wrong or unlawful but it may lead to stigmatisation and increase social discrepancy. It may deprive people of equal opportunities or burden them with additional risks, but it also may impact their standard of living. “For example, low-quality products may be offered to a person because she is assumed not to have the time or the intelligence to do a product-comparison or because she is supposed to lack the money for better quality.” (ref 9)
Often there is also no due process that gives a person an effective remedy in case they have been unfairly treated by an automated process. The so-called ‘Computer says no’ problem extends to people being denied lifefundamental services (for example, access to education or heath) without the opportunity to object. In order to try to minimalise the impact of these vulnerabilities, software engineers have to expand the traditional system requirements set (functionality, efficiency, ease of use etc.) with the following informational value requirements: • Methodological integrity – using rigorously sound and contestable methodologies: mathematical and empirical software verification • Fairness – performing testing bias in the training set, testing bias in the learning algorithm • Accountability – providing (con)testability of both data sets and algorithms • Data Protection by Default – engineering Privacy as the default setting • Data Protection by Design – engineering Privacy by Design as a system requirement The first three principles are not really new – they have always existed but now they have to be explicitly documented in order to be able to explain a decision and be transparent – as stated in article 12 of GDPR. Data Protection by Default and Design are newly-coined terms in GDPR, but they refer to the privacy engineering principle of Privacy by Design that was introduced back in the 1990s.
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 13
Privacy by Design (PbD) More than two decades ago, when most of us technology people just worried about Y2K or the internet collapsing, the true visionary Dr. Ann Cavoukian realised that those doom scenarios are not the ones to be worried about. Working as the Canadian privacy expert and Commissioner for Ontario and with a background in psychology, criminology and law, she realised that systemic effects of ever-growing information systems will affect our lives in much more profound ways than we ever imagined. In order to realise the true benefits of this technology growth, she recognised the need to incorporate the social and human value norms into the design of those technology systems. She developed the Privacy by Design framework – an approach that is characterised by proactive, positive-sum (full functionality) measures. The objectives of this framework are not just to ensure privacy and personal control over one’s information but also to gain a sustainable competitive advantage for the organisations themselves by doing so.
Privacy by Design consistS of seven Foundational Principles: 1 Proactive not reactive; preventative not remedial Privacy-invasive events are anticipated and prevented before they actually happen. Focus is on before-thefact prevention, not a remediation after a problem (e.g. data breach) occurs.
2 Privacy as the default setting This means maximum privacy protection offered as baseline – the maximum degree of privacy is ensured by automatically protecting personal data in any given IT system or business practice. No action is required on the part of the individual to protect their privacy – it is built into the system, by default.
3 Privacy embedded into design Embed privacy into the design and architecture of IT systems and business practices by treating it as any other system requirement (e.g. usability, performance). This way privacy becomes an essential component of the core functionality being delivered.
4 Full functionality – positive-sum, not zero-sum Implementation of privacy is not compromising business goals. All legitimate interests and objectives 14 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
are accommodated in a positive-sum, win-win manner without unnecessary trade-off. It is an approach of ‘and’ vs ‘or’, having to, for example, accommodate for privacy and security at same time.
5 End-to-end security – full lifecycle protection The security measurement is to be implemented through the whole information management lifecycle and embedded into the system prior to the collection of the information. All data is to be securely retained, and then securely destroyed at the end of the process, in a timely fashion.
6 Visibility and transparency – keep it open All stakeholders must operate according to any stated promises and objectives and must be subject to independent verification. Systems component parts and operations must remain visible and transparent to all actors, users and providers alike.
7 Respect for user privacy – keep it user-centric Architects, engineers and operators are to protect the interests of the individual by offering such measures as strong privacy defaults, appropriate notice, and empowering user-friendly privacy options.
Full Functionality — Positive-Sum, not Zero-Sum
7
Proactive not Reactive; Preventative not Remedial
FOUNDATIONAL PRINCIPLES
Privacy as the Default Setting
Visibility and Transparency — Keep it Open
In the last decade Privacy by Design has gained traction in policy circles around the world. In 2010 Privacy by Design was unanimously passed as an International Standard by the International Assembly of Privacy Commissioners and Data Protection Authorities. Since then it has been translated into 40 languages. It was recognised by the U.S. Federal Trade Commission in 2012, and included in the new E.U. GDPR, which comes into effect on May 25, 2018 – a major achievement and testament to its strength. It was recognised by U.S. Federal Trade Commission in 2012 and was made a mandatory component of GDPR in 2016. But is that enough? To quote Dr. Ann Cavoukian: “But if that’s all you do, it’s not enough. I want you to go further. When I ask you to do Privacy by Design, it’s all about raising the bar. Doing technical measures such as embedding privacy into the design that you’re offering into the data architecture, embedding privacy as a default setting. That’s not a legalistic term. It’s a policy term. It’s computer science.” (ref 10)
Privacy Embedded into Design
End-to-End Security — Full Lifecycle Protection
Respect for User Privacy — Keep it User-Centric
The actual operationalisation of this principle in the engineering of products and services remains, however, an open question, as its nature of not (yet) standardised features leaves an open field for misinterpretation. Many parties propose privacy-by-design methodologies promising to be the ultimate solution for embedding privacy, but their reach in terms of audience, the level of detail and practically is rather limited. In order to close the gap between policy makers’ and engineers’ understanding of privacy by design let’s try to give a (non) exhaustive guide of terms and methodologies used. THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 15
Terminology of privacy The terms that are currently used in the context of privacy are often confusing as they may mean something totally different in the world of software engineering. As many of misunderstandings have stemmed from people using the wrong terminology, let’s review some of the most common terms. • Data inventory A data inventory is a fully described record of the data assets maintained by an organisation. The inventory records basic information about a data asset including its name, contents, update frequency, owner/ maintainer, data origin, and other relevant details. The details about a dataset are known as metadata. Frequently used synonyms are data catalogue/data dictionary. Note that data inventory is about all data, not just personal data.
• Data Flow Diagram (DFD) A data flow diagram illustrates how data is processed by a system in terms of inputs and outputs. As its name indicates its focus is on the flow of information, where data comes from, where it goes and how it gets stored.
16 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
• Data mapping Data mapping is a special type of data inventory that shows how data from one information system maps to data from another information system. It usually contains the following elements: • List of attributes for the original source of data • A corresponding (or ‘mapped’) list of attributes for the target data • Translation rules defining any data manipulation that needs to happen as information moves between the two sources, such as setting default values, combining fields, or mapping values
• Register of processing Article 30 of GDPR introduces a concept of keeping a register of data processing activities stating that “Each controller and, where applicable, the controller’s representative, shall maintain a record of processing activities under its responsibility”. That record shall contain all of the following information: • the name and contact details of the controller and, where applicable, the joint controller, the controller’s representative and the data protection officer; • the purposes of the processing; • a description of the categories of data subjects and of the categories of personal data; • the categories of recipients to whom the personal data have been or will be disclosed, including recipients in third countries or international organisations; • where applicable, transfers of personal data to a third country or an international organisation, including the identification of that third country or international organisation and, in the case of transfers referred to in the second subparagraph of Article 49(1), the documentation of suitable safeguards; • where possible, the envisaged time limits for erasure of the different categories of data; • where possible, a general description of the technical and organisational security measures referred to in Article 32(1). Important to note is the difference between the details of processing activities versus the details of a data inventory as the register does not require documenting every data element in the organisation nor its mapping. However, one may argue that a proper register of data processing could not be possible without an existing data inventory and data mapping.
• Data discovery for Privacy Data discovery is a term used in data analytics to define the process and tools used to uncover hidden patterns and trends. In the context of privacy, this term is used for the technologies that automatically discover workflows across organisational collaborators that include personal data or identifying personal data existing in semistructured and unstructured environments.
• Data classification Data classification, in the context of information security, is the classification of data based on its level of sensitivity and the impact to the organisation should that data be disclosed, altered or destroyed without authorisation. The classification of data helps determine what baseline security controls are appropriate for safeguarding that data. In general, it differs from information classification by its granularity – while information classification may be on a higher level (data type or even business process), data classification is usually on data elements. Data classification involves tagging and labelling data elements, which makes it easily searchable and trackable. Also, data classification may be performed for a number of reasons other than security, including ease of access, to comply with regulatory requirements, and to meet various other business or personal objectives.
• Data minimisation This term was coined back in 1980 as part of OECD principles. According to EDPS (European Data Protection Supervisor) the principle of ‘data minimisation’ means that a data controller should limit the collection of personal information to what is directly relevant and necessary to accomplish a specified purpose. They should also retain the data only for as long as is necessary to fulfil that purpose. In other words, data controllers should collect only the personal data they really need, and should keep it only for as long as they need it. This has directly been translated into GDPR articles 5 and 6. THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 17
• Anonymity Anonymity of a subject means that the subject is not identifiable. From [ISO/IEC Guide 99]: “Anonymity ensures that a user may use a resource or service without disclosing the user’s identity. The requirements for anonymity provide protection of the user identity. Anonymity is not intended to protect the subject identity.”
• Unlinkability From [ISO/IEC Guide 99]: “Unlinkability ensures that a user may make multiple uses of resources or services without others being able to link these uses together.” Note that true anonymity requires unlinkability.
• Undetectability / Unobservability Undetectability of an item of interest (IOI) means that other parties cannot sufficiently distinguish whether it exists or not. From [ISO/IEC Guide 99]: “Unobservability ensures that a user may use a resource or service without others, especially third parties, being able to observe that the resource or service is being used. “
• Encryption Encryption is the process of encoding the information in such a way that only authorised parties can access it. The encryption translates data into another form, or code, so that only people with access to a decryption key or password can read it. Encrypted data is commonly referred to as ciphertext, while unencrypted data is called plaintext. A mathematical procedure for performing encryption on data is called an encryption algorithm. Rather than focusing on usability, the goal of encryption is to ensure the data cannot be consumed by anyone other than the intended recipient(s). Blowfish, AES RC4, RC5, and RC6 are examples of encryption algorithms.
18 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
• Hashing Hashing is based on the concept of integrity, i.e. making it so that if something is changed you know that it’s changed. Technically, hashing takes arbitrary input and produces a fixed-length string. In hashing, a new message is created from the original message in a particular way by which it cannot be reversed. Unlike encryption, it does not require a key to unlock the message. It is used for verifying files, etc. In this way, it ensures that the integrity is maintained. Once the message is hashed, its hash is used for comparisons. If the hash is the same for any message then it is regarded as the same as the original message. These are different types of hashing algorithms used like MD5, SHA, RIPMEND, TIGER etc.
• Tokenization Tokenization is the process of replacing sensitive data with unique identification symbols that retain all the essential information without compromising its security. In its most basic form, it is simply substituting a randomly generated value (token) for a cleartext value and keeping a lookup table (token vault) in a secure place, which maps the cleartext value to the corresponding token. The token data type and length typically remain the same as the cleartext, and the token lookup table becomes the ‘key’ allowing the cleartext value to be retrieved from the token (ref 11). Tokenization does not have to use a mathematical process to transform the sensitive information into the token. There is no algorithm that can be used to derive the original data for a token. Instead, tokenization uses a database, called a token vault, which stores the relationship between the sensitive value and the token. The real data in the vault is then secured, often via encryption. Encryption, hashing and tokenization are all considered compliant measures according to GDPR. It is mostly their purpose that will define which processes will be used and how. Encryption focus is on security, hashing on integrity and tokenization on traceability, so most likely an organisation will use all of them.
• Anonymisation Anonymisation is the process of turning data into a form which does not identify individuals. The GDPR defines anonymised data as “data rendered anonymous in such a way that the data subject is not or no longer identifiable”. It means that data must be stripped of any identifiable information, making it impossible to derive insights on a discreet individual, even by the party that is responsible for the anonymisation.
• Pseudonymisation Pseudonymisation is a method to substitute identifiable data with one or more artificial identifiers, so called pseudonyms. The purpose is to render the data record less identifying while preserving data usability in analytics and processing. The GDPR defines pseudonymisation as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.”
From a GDPR point of view the biggest difference between these two methods is that anonymisation places the processing and storage of personal data outside the scope of the GDPR. However, one may argue that true anonymisation is not possible. According to the deep learning authority Pete Warden (ref 12) “The Anonymisation process is an illusion. Precisely because there are now so many different public datasets to cross-reference, any set of records with a non-trivial amount of information on someone’s actions has a good chance of matching identifiable public records.” And even if possible and done properly, anonymisation is an irreversible process which may devaluate data is such a way that the organisation cannot longer perform their business process. GDPR does recognise the need to preserve data utility and therefore use pseudonymisation as a means of compliance. Recital 29 of the GDPR aims “to create incentives to apply pseudonymisation when processing personal data” and finds that “measures of pseudonymisation should, whilst allowing general analysis, be possible”. These incentives appear in five separate sections of the Regulation, allowing pseudonymisation as a means for making data processing legal in cases which would otherwise not be lawfully possible. The effectiveness (and legality) of both anonymisation and pseudonymisation will always hinge on their abilities to protect data subjects from re-identification. The GDPR states introduction of ‘pseudonymisation’ is not intended to preclude any other measures of data protection. Both methods should be accompanied with proper de-identification protocols and security measures combined with the privacy risk framework.
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 19
Privacy risk management framework “We demand rigidly defined areas of doubt and uncertainty!” Douglas Adams, The Hitchhiker’s Guide to the Galaxy
DPIA Data Protection Impact Assessment DESCRIPTION OF THE ENVISAGED PROCESSING
MONITORING AND REVIEW
DOCUMENTATION
MEASURES ENVISAGED TO ADDRESS THE RISKS
20 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
ASSESSMENT OF THE NECESSITY AND PROPORTIONALITY
MEASURES ENVISAGED TO DEMONSTRATE COMPLIANCE
ASSESSMENT OF THE RISKS TO THE RIGHTS AND FREEDOMS
Privacy risk frameworks attempt to assess the practical implementation of privacy protection through an approach grounded in risk management. They help organisations to calculate or estimate their privacy risk in IT systems and business processes, as well as assess which organisational and technical measurements are possible and appropriate. Because of their broad nature they have often been extended to a full management framework. The current praxis is mostly dominated by pragmatic frameworks like Privacy Impact Assessment (PIA) – now known as Data Protection Impact Assessment (DPIA). However, there are numerous frameworks that have, or are being, developed which offer a more systematic approach and their popularity is rising. Below is just an overview with short descriptions of their constellations.
• DPIA – Data Protection Impact Assessment (aka PIA –Privacy Impact Assessment) Privacy Impact Assessment is a type of impact assessment designed to describe the processing, assess the necessity and proportionality of a processing and to help manage the risks to the rights and freedoms of natural persons resulting from the processing of personal data. This concept emerged during the 1980s and has been systematically described by Australian privacy expert Roger Clarke in 1996. Under GDPR, it has been renamed to DPIA and made mandatory when the processing is “likely to result in a high risk to the rights and freedoms of natural persons” (Article 35(1)). GDPR provides data controllers with flexibility to determine the precise structure and form of the DPIA, but the Working Party 29 and almost all Data Protection Authorities have provided useful templates with all follow generic EU PIAF approach (ref 13).
• Calo’s dichotomy Ryan Calo is law professor at the University of Washington and faculty director of the interdisciplinary Tech Policy Lab. In his work “The Boundaries of Privacy Harm” (ref 14) he describes privacy harm as falling into two related categories: • Subjective category – privacy harm of an unwelcome mental state (anxiety, embarrassment …) caused by the belief that one is being watched or monitored. This can vary from a landlord spying on his tenants to a governmental surveillance (‘Big Brother’ effect). • Objective category – privacy harm of a negative, external actions caused by reference to personal information. This can be extreme as an identity theft or as common as use of medical record to refuse an insurance. The approach uncouples privacy harm from privacy violations, demonstrating that there can be a privacy violation with no privacy harm (and vice versa). It introduces a ‘limiting principle’ – identifying of another value (like equality or freedom of speech) is more directly at stake. It also creates a ‘rule of recognition’ that permits the identification of a privacy harm when no other harm is apparent. This allows an organisation to assess privacy threats according to their impact (and not to their nature).
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 21
• Contextual integrity Contextual integrity (CI) is a framework developed by New York University professor Helen Nissenbaum (ref 15). It defines that many privacy issues occur when information is taken from one context and brought into another. For example, it is deemed appropriate to share information about one’s health with a physician but not during a job interview. Each context in which we share information has associated roles, activities, norms, and values and may evolve over time. (We were all fine to share everything with social media but that sentiment is changing fast). Nissenbaum introduces the notion of contextual information norms – the norms of appropriateness and the norms of distribution (from one party to another). Norms are “characterised by four key parameters: context, actors, attributes, and transmission principles”. Nissenbaum also provides a detailed step-by-step process to identify privacy threats called “Augmented Contextual Integrity Decision Heuristic”: • Describe the new practice in terms of information flows. • Identify the prevailing context. Establish context at a familiar level of generality (e.g., ‘health care’) and identify potential impacts from contexts nested within it, such as ‘teaching hospital.’ • Identify information subjects, senders, and recipients. • Identify transmission principles. • Locate applicable entrenched informational norms and identify significant points of departure. • Prima facie assessment: There may be various ways a system or practice defies entrenched norms. • Evaluation I: What might be the harms, the threats to autonomy and freedom? What might be the effects on power structures, implications for justice, fairness, equality, social hierarchy, democracy, and so on?
22 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
• Evaluation II: Ask how the system or practices directly impinge on values, goals, and ends of the context. In addition, consider the meaning or significance of moral and political factors in the light of contextual values, ends, purposes, and goals. On the basis of these findings, contextual integrity recommends in favour of or against systems or practices. Often defined as a philosophical approach to privacy, this framework goes much further in even providing the logic for expressing and reasoning about norms. For example, this statement “a financial institution may not disclose personal information, unless such financial institution provides or has provided to the consumer a notice.” can be expressed as • IF send (financial-institution, third-party, personalinformation) • THEN PREVIOUSLY send (financial-institution, consumer, notification) • OR EVENTUALLY send (financial-institution, consumer, notification) This makes this framework translatable into business processes and software engineering in a truly pragmatic way.
• Solove’s taxonomy of privacy (harms) “A Taxonomy of Privacy” (ref 16) is a work by Daniel J. Solove, professor of law at the George Washington University Law School and privacy expert, in which he breaks down 16 privacy issues into four areas. It is a classification which offers not only the identification of potential privacy risk but also advocates for a zerosum trade-off approach in which the value of privacy and conflicting interests are carefully balanced.
DATA SUBJECT
INFORMATION COLLECTION Surveillance Interrogation
• Information collection: surveillance, interrogation • Information processing: aggregation, identification, insecurity, secondary use, exclusion • Information dissemination: breach of confidentiality, disclosure, increased accessibility, blackmail, appropriation, distortion, exposure • Invasion: intrusion, decisional interference
INFORMATION PROCESSING Aggregation Identification Insecurity Secondary Use Exclusion DATA HOLDERS
INFORMATION DISSEMINATION Breach of Confidentiality Disclosure Increased Accessibility Blackmail Appropriation Distortion Exposure
INVASIONS Intrusion Decisional interference
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 23
• NISTIR 8062 by NIST (US National Institute of Standards and Technology) (ref 17) Building on the impact of the Risk Management Frameworks for cybersecurity, NIST is developing a Privacy Risk Management Framework. Rather than emphasising the classic cybersecurity triad of Confidentiality, Integrity, and Availability it contributes the core privacy principles of Predictability, Manageability, and Disassociability. According to NISTIR 8062: • Predictability is the enabling of reliable assumptions by individuals, owners, and operators about personal information and its processing by an information system. • Manageability is providing the capability for granular administration of personal information including alteration, deletion, and selective disclosure. • Disassociability is enabling the processing of personal information or events without association to individuals or devices beyond the operational requirements of the system.
Privacy Requirements
Laws Regulations FIPPs
• CNIL Methodology for Privacy Risk Management (ref 18) CNIL Commission nationale de l’informatique et des libertés, the French Data Protection Authority) The French Data Protection Authority CNIL has lead the way in the translation of privacy policies into the applicable measures. The CNIL methodology “describes a method for managing the risks that the processing of personal data can generate to individuals”. This methodology not only provides a complete analytical approach but it is also linked to a catalogue of measures and ready-to-use templates to address all risks assessed with it. This approach consists of five iterative steps: 1 2 3 4 5
In addition, this is a continuous improvement process. It therefore requires monitoring changes over time (context, risk, measures…) and updates with each a significant change
Risk Models
Privacy Impact Assessment • Risks Identified • Controls implemented • How the system meets requirements
Risk Assessment
Privacy Engineering & Security Objectives
Risk Management Framework
Select controls, etc.
24 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
the context of the processing of personal data the feared events in this particular context the possible threats (if needed), the risks involved (if needed) the appropriate measures to treat them
•M ap system capabilities/ requirements • Assurance that system meets requirements & addresses risk
• LINDDUN Privacy threat modelling (ref 19) LINDDUN is a privacy threat analysis methodology that supports analysts in eliciting privacy requirements developed by Mina Deng. It shares the principles of the CNIL method, but it is a more systematic engineering approach based on data flow diagrams and privacy threat tree patterns. • • • • • • •
Linkability Identifiability Non-Repudiation Detectability Disclosure of information Content Unawareness Policy and consent Noncompliance
This methodology systematically identifies the privacy threats in a system and the solutions that mitigate them, by following six linear steps:
6
SOLUTION SPACE
LINDDUN is an acronym which stands for:
4 Prioritise threats, depending on the risk associated with each one. 5 Elicit mitigation strategies, according to a taxonomy of strategies and a table that maps threat types to strategies.
PETs
5
Elicit mitigation strategies
4
3
PROBLEM SPACE
3 Identify threat scenarios, according to the guidance provided by a set of privacy-threat-tree patterns.
corresponding
Prioritise threats
1 Define a data flow diagram (DFD), departing from either the requirements specification or the system architecture while focusing on the internal data stores and the cross organisational data flows. 2 Map privacy threat categories to DFD elements (just defined in the step above), according to a predefined table that details potential threat categories for each type of DFD element.
Select
Identify threat scenarios
2
Map privacy threats to DFT elements
1
Define DFT
6 Select Privacy-Enhancing Technologies, constrained by the mitigation strategies just elicited. The technical measurements in the context of privacy are called Privacy Enhancing Technologies (PETs). THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 25
Privacy Enhancing Technologies (PETs) There are no uniform definitions of PETs, but existing definitions all state that, to qualify as PETs, technologies have to reduce the risk of contravening privacy principles and legislation and minimise the amount of personal data being held. The European Commission in its Communication to the European Parliament and the Council on Promoting Data Protection by Privacy Enhancing Technologies (PETs) describes a PET as “a coherent system of ICT measures that protects privacy by eliminating or reducing personal data or by preventing unnecessary and/or undesired processing of personal data, all without losing the functionality of the information system.” There is a tremendous number of PETs available, many of them consisting of different technologies or being applicable just within specific fields. These complexities
make it difficult to understand their nature, their usefulness and applicability. The use of classification with specific terminology (e.g., ‘data protection tools’, ‘data minimisation tools’ etc.) may help to better grasp their nature and align them with their purpose. There is a variety of PETs classification mostly based on their technological characteristics. One of the most comprehensive overviews of those frameworks is provided by Karlstad University professor Lothar Fritsch (ref 20) and London Economics study (ref 21). Below is an overview of their interpretation of different classifications.
• FIDIS (2007) PETs classification According to the European Project on the Future of Identity in the Information Society (FIDIS) PETs can be divided into ‘opacity tools’ and ‘transparency tools’. The purpose of transparency tools is to provide insight into data processing. Opacity tools are meant to hide a user’s identity. Transparency tool
Opacity tool
Definition
Tools that show clearly to a person what personal data is being processed, how it is processed, and by whom it is processed
Tools that hide a person’s identity or his relationship to data as it is processed by someone else
Nontechnical example
Legal rights to be informed about data processing; Privacy audits
Pseudonymous access to online service; Election secrecy
Technical example
Database audit interfaces; Audit agents; Log files
MixMaster anonymous email; TOR anonymising web service; Pseudonyms
26 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
• META (2005) PETs classification Meta Group is a study of the Danish Government which divides PETs also in two groups but it uses a slightly different terminology – privacy management instead of transparency and privacy protection instead of opacity. It also classifies further in terms of purpose (curative or informative) and function (unobservability, unlinkability and anonymity). Category
Subcategory
Description
Privacy Protection
Pseudonymise tools
Enabling transactions without requiring private information
Anonymiser products and services
Providing services without revealing the user’s address and identity
Encryption Tools
Protecting email, documents and transactions from being read by other parties
Filters and blockers
Preventing unwanted email and web content from reaching the user
Track and evidence erasers
Removing electronic traces of the user’s activity
Informational tools
Creating and checking Privacy Policies
Administrative tools
Managing user identity and permissions
Privacy Management
• PET-Staircase The PET-Staircase proposed by Koorn, 2004 (ref 22), divides PETs into four categories: • General PET controls • Separation of data • Privacy management systems EFfectivenessof the pet-option
• Anonymisation
AnonymiSation
privacy management systems Separation of data General pet controls • Encryption • Access security • Role-based access controls • Biometrics • Quality Enhancing Technologies
• Splitting of identity domain and pseudoidentity domain • Identity protector controlled by data processing or TTP or individual concerned if personal data under personal control applies
• Privacy Incorporated Software Agent (PISA) • Platform for Privacy Preferences Project (P3P) and Enterprise Privacy Authorisation Language (EPAL) • Privacy ontology • Privacy rights management
• No registration of personal data • Destruction of personal data immediately after required processing
| 27
• Clarke’s (2007) PETs classification Roger Clarke’s classification of PETs is based on their functional characteristics, taking into account legal and practical issues (for example the PET for a certain use case may exist but its use is illegal or it may destroy the information’s usability). Category
Examples
Pseudo_PETs
Privacy seals, P3P
Counter_Technology
Counter one specific privacy threat, e.g. SSL encryption or spyware removal
Savage PETs
Provide untraceable anonymity
Gentle PETs
Pseudonymity tools balanced with accountability and identity management
• PETs classification after Hacohen (2009) The classification by Yoram Hacohen, Head of the Law, Information and Technology Authority of Israel (ILITA) follows the way how personal data is processed in the real world. It divides PETs into technologies that are used before processing of data and technologies to protect the personal data during storage and processing. It is also the is mostly closely aligned with technologies and processes mentioned in GDPR. Pre_usage PETs
Usage PETs
Data minimisation
Data quality
Anonymisation
Verification
Limitation of use
Encryption
e_consent mechanism
Watermarking, tagging, sticky policies Usage logging
• PPDM The goal of Privacy-preserving data mining (PPDM) is to develop algorithms for modifying the original data in some way, so that the personal data remain private even after the mining process. There are several methods too of PPDM, varying from (statistical) data modification to more cryptographic methods (ref 51). ENISA paper “Privacy by design in big data” (ref 25) describes in which situation certain methods and techniques apply.
• An exhaustive summary of PETs does not exist As the technology is constantly changing it is hard to provide an up-to-date overview of PETS and their use cases. The report prepared by the Technology Analysis Division of the Office of the Privacy Commissioner of Canada is an excellent overview per use case (ref 23) and Stanford Cyberlaw wiki (ref 24) provides an alphabetical overview of them. In order to gain their full benefit for privacy and data protection, PETs need to be rooted in a data governance strategy that is applied in practice. 28 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
Privacy-preserving data mining (PPDM) Data modification
cryptographic methods
AnonymiSation approach k-Anonymity
sdl
Generalisation
Attribute (AG) CELL (CG)
Suppression
l-Diversity
Attribute (AS) CELL (cS)
Confidence Building
tuple (tS)
(a,k)-Anonymity
personalised privacy
(x,y)-privacy
p-sensitive k-anonymity
(k,e)-Anonymity t-Closeness perturbation approach VAlue distortion approach probability distribution approach Gaussian perturbation approach Swapping approach Hiding approach Sampling approach Randomisation approach Miscellaneous approach
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 29
FRAMEWORKS FOR IMPLEMENTING PRIVACY “It is a mistake to think you can solve any major problems just with potatoes.” Douglas Adams, Life, the Universe and Everything
Awareness and knowledge are essential for achieving Privacy by Design, but an implementation framework is indispensable. Using a policy or a PET does not assure privacy, as it is multi-dimensional and can never be addressed with just one solution nor implemented in one step. The implementation requires a logical and detailed approach, preferably with a high level of standardisation.
30 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
Since the late 60s, many privacy frameworks have been developed. Some of them are area specific (territorial or technical), some regulatory binding and some aiming to achieve standardisation. In fact, GDPR is a legal privacy framework while Privacy by Design is presented as one. They can be grouped into: • General principal frameworks • The standardisation frameworks • Domain specific frameworks • The implementation frameworks for software engineering General principal frameworks usually address guidelines, setting up privacy programs, cross-border rules and accountability principles. The most known and one of the oldest frameworks is the OECD Guidelines Governing the Protection of Privacy and Transborder Flows of Personal Data – developed in 1980 and revised into full privacy framework by 2013. The ‘Basic Principles’ of the Guidelines are still the same and have been the stepping stone for almost all privacy related development in the past 40 years. Other general frameworks are APEC (Asia-Pacific Economic Cooperation) Privacy Framework, FTC FIPPs (Fair Information Practice Principles), and GAPP (Generally Accepted Privacy Principles) from AICPA/CICA. However, the general frameworks are high-level overviews and do not offer a detailed software implementation manual.
The domain specific frameworks are highly detailed and usually provide more than enough information about the implementation. They are often use case or niche based, so their operational usability is quite narrow and may not cover the full scope of companywide privacy implementation. Some great examples regarding smart meters, mobiles etc. are mentioned in “Operationalising Privacy by Design” (ref 30) and LACE (ref 31) and SOLAR developed privacy framework for learning analytics (ref 32). The most promising is the development of privacy frameworks for AI and big data which is upcoming. Most of the implementation frameworks designed for software engineering are agnostic and applicable in every software domain, but as yet not well known. In order to save you from reinventing the wheel (or some other kind of already invented infinite drive), a few of them are listed below to help assemble the Privacy By Design starship. The International Security Trust and Privacy Alliance (ISTPA) was a global alliance of business and technology providers that defined the first privacy framework for building operational technology solutions. The list of participating partners (IBM, Teradata, CA…) and the level of implementation was impressive. Imagine working privacy rules engines or implanted consent audit trails on database row level, it was all there (and still is – if you open one of the famous reference models you will easily find them).
The standardisation frameworks are trying to bridge that gap by developing standards and templates for implementation. Most of them are still under development and tend to be quite lengthy and are, by their nature, rigid in their approach. They can provide great guidance on some specific processes and their development is worth monitoring (or contributing to, if you are brave enough). Most interesting and applicable are ISO/IEC 29100:2011 (ref 26), ISO/IEC 27550 Privacy engineering (ref 27), ISACA Privacy Principles and Program Management Guide (ref 28) and IEEE P7002™ Data Privacy Process (ref 29).
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 31
• OASIS-PMRM and OASIS PbD-SE
and describing the (scope) of uses cases to full range specification and implantation of Privacy Controls within Privacy Architecture.
OASIS is a non-profit consortium that drives the development, convergence and adoption of open standards for the global information society. It was chosen by ISTPA to further develop PMRM (Privacy Management Reference Model).
OASIS also provides Privacy by Design Documentation for Software Engineers (OASIS PbD SE ref 34). In the annex of this documentation the uses cases are described at detailed level, linking privacy requirements to technical measures and PETs. (example below, PII health care)
The PMRM (ref 33), as a methodology covers a series of tasks and provides templates and applicable examples for all of them. The tasks vary from defining
SYSTEM SuperContainer SECURITY
DOCTOR
ssl – All communication over secure communication connection
Pseudonymisation PII Replacement – Replace PII with codes in program’s input data
HEAD NURSE
View treatments
NOTICE & AGREEMENT Privacy Notice – on storage and usage of obtained data Agreement – Obtain agreement on storage and use of obtained data
DATA SCIENTIST
Review recommended treatments
View alternative patients treatments
Anonymisation Anonymisation Default: k-anonymity L-diversity
PUBLIC RESEARCHER
32 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
Analyse data
• MITRE Privacy Engineering Framework (ref 35) MITRE, a US federal government not-for-profit organisation developed an engineering framework for building PbD into government systems. They align this to the standards system engineering methods with extra attention to privacy requirement design and testing.
ATIO N EST A ECT T
T EVA EST & LUA TIO
N
PROJ
N
NITIO
ENG
DEFI
NTS EME G UIR N REQ INEERI
ND IN
JECT
PRIVACY DESIGN & DEVELOPMENT
TRA OP NSITIO E MA RATIO N INT ENA N & NC E
TEGR
D
PRO
EPT T NC CO PMEN LO EVE
S INT YSTEM EGR ATIO
TEM E SYS CTUR E IT H ARC
PRIVACY REQUIREMENTS DESIGN
PRIVACY VERIFICATION & VALIDATION
N
SYSTEM DESIGN & DEVELOPMENT
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 33
• PRIPARE
aligns it not only with ISO standards and CNIL Risk Framework but also with all agile and waterfall project methodologies. It also provides detailed templates and examples for all activities.
Preparing Industry to Privacy-by-design by supporting its Application in Research (PRIPARE, ref 36) is an EU project with main objective to facilitate the application of a privacy and security-by-design methodology. The project has developed a Privacy- and Securityby-Design Methodology Handbook for implementing privacy and fostering compliance with the GDPR. The methodology, using OASIS-PMRM as a starting point,
For the implementation of the technical measures PRIPARE framework uses the concept of privacy & security patterns which have developed within Privacy Design Strategies.
TECHNICAL METHODS ORGANIsATIONAL METHODS
informed in
RISKS
se lec ts
informed in
guides the selection of
P&S CONTROLS
determines I II III
THREATS
determines
LEVEL OF COMPLIANCE
P&S TARGETS GUIDELINES FEARED EVENTS
conform with PREPARE implemented by TECH-
NOLOGY
SERVICES
influence apply to
determines LAWS &
REGULATIONS
according to
manages used within manages P&S
DOMAIN
inform
POLICIES & PRICIPLES
applied by
STAKEHOLDERS
P&S ARCHITECTURE & DESIGN
develop
CORE CONCERNS
LEGEND ReqOp PEARs PMRM Risk Management PIA Compliance and Governance
34 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
protection of personal data
ap pl ies
selects & applies
PETs
modifies
PRIVACY & SECURITY PATTERNS
categorises
improves level of
STRATEGIES
affects categorises
FUNCTIONAL
represented DESCRIPTION in
express
&
selects & applies
implemented by
inform responsible for
derives
MECHANISMS
selects & applies
PIA
inform
improves level of QUALITY ATTRIBUTES
TACTICS
• Privacy Design Strategies Privacy Design Strategies map ‘fuzzy’ legal concepts to concrete data protection goals to help control data processing. Developed by Jaap-Henk Hoepman (ref 37) they are not only part of PRIPARE, but are also considered as de-facto implementation of Privacy by Design by ENISA (European Union Agency for Network and Information Security) and DECODE (DEcentralised Citizens Owned Data Ecosystem), as well as many of the EU Data Protection Authorities. In software engineering, a software design pattern is a general, reusable solution to a commonly occurring
problem within a given context in software design. The software design strategies are about organising the design activities within that given context during the system development life cycle. Privacy by design strategies are derived from existing software and privacy principles and data protection laws as well. They equal the software analysis process, while the privacy design pattern is used to describe the design and PETs for implementation. They are eight privacy design strategies, divided into two groups – data and processes oriented.
DATA SUBJECT INFORM
MInImISE
control
Separate
ABSTrACT
DEMONSTRATE
HIDE
ENFORCE
DATA CONTROLLER THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 35
Privacy design strategy
Description
Privacy design patterns
PETs
Minimise
Limit as much as possible the processing of personal data.
EXCLUDE: refraining from processing a data subject’s personal data, partly or entirely, akin to blacklisting or opt-out. SELECT: deciding on a case by case basis on the full or partial usage of personal data, akin to whitelisting or opt-in. STRIP: removing unnecessary personal data fields from the system’s representation of each user. DESTROY: completely removing a data subject’s personal data.
Select before you collect; anonymisation.
Separate
Distribute or isolate personal data as much as possible, to prevent correlation.
DISTRIBUTE: partitioning personal data so that more access is required to process it. ISOLATE: processing parts of personal data independently, without access or correlation to related parts.
Distributed processing and storage where feasible; split database tables; secure multi-party computation; unlinkability.
Abstract
Limit as much as possible the detail in which personal data is processed.
SUMMARISE: extracting commonalities in personal data by finding and processing correlations instead of the data itself. GROUP: inducing less detail from personal data prior to processing, by allocating into common categories. PERTURB: add noise or approximate the real value of a data item.
Aggregation over time and geography; dynamic location granularity; k-anonymity; differential privacy.
Hide
Prevent personal data becoming public or known.
RESTRICT: preventing unauthorised access to personal data. MIX: processing personal data randomly to reduce correlation. ENCRYPT: encrypting data (in transit or at rest). OBFUSCATE: preventing understandability of personal data to those without the ability to decipher it. DISSOCIATE: removing the correlation between different pieces of personal data.
Encryption, onion routing, anonymous credentials, homomorphic encryption; attribute based credentials; pseudonymisation.
36 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
Privacy design strategy
Description
Privacy design patterns
PETs
Inform
Inform data subjects about the processing of their personal data.
SUPPLY: making available extensive resources on the processing of personal data, including policies, processes, and potential risks. NOTIFY: alerting data subjects to any new information about processing of their personal data in a timely manner. EXPLAIN: detailing information on personal data processing in a concise and understandable form.
(Algorithmic) Transparency, data breach notifications, UI design privacy icons; platform for privacy preferences.
Control
Provide data subjects control about the processing of their personal data.
CONSENT: only processing the personal data for which explicit, freely-given, and informed consent is received. CHOOSE: allowing for the selection or exclusion of personal data, partly or wholly, from any processing. UPDATE: providing data subjects with the means to keep their personal data accurate and up to date. RETRACT: honouring the data subject’s right to the complete removal of any personal data in a timely fashion.
Informed consent, privacy dashboard; user centric identity management.
Demonstrate
Commit to processing personal data in a privacyfriendly way, and enforce this.
CREATE: acknowledging the value of privacy and deciding upon policies which enable it, and processes which respect personal data. MAINTAIN: considering privacy when designing or modifying features, and updating policies and processes to better protect personal data. UPHOLD: ensuring that policies are adhered to by treating personal data as an asset, and privacy as a goal to incentivise as a critical feature.
Access control, privacy policy.
Enforce
Demonstrate you are processing personal data in a privacy-friendly way.
LOG: tracking all processing of data, without revealing personal data, securing and reviewing the information gathered for any risks. AUDIT: examining all day-to-day activities for any risks to personal data, and responding to any discrepancies. REPORT: analysing collected information on tests, audits, and logs periodically to review improvements to the protection of personal data.
Privacy management system, sticky policies; logging.
As all of the mentioned privacy frameworks are modular and not rigid, reusing only elements that suit companies’ needs seems to be the best approach. This approach is perfectly described in “The Privacy Engineer’s Manifesto” (ref 5), applying multiple frameworks within several different scenarios. For truly understanding privacy in software engineering this is a must-read.
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 37
VALUE INCREASING EFFECT OF PRIVACY “Many were increasingly of the opinion that they’d all made a big mistake in coming down from the trees in the first place. And some said that even the trees had been a bad move, and that no one should ever have left the oceans” Douglas Adams, The Hitchhiker’s Guide to the Galaxy
One can argue that the implementation of privacy, although relevant and necessary from a legal and ethical point of view, will have a tremendously negative impact for organisations and the economy as whole. This is not based just on the cost of potential fines and the implementation costs of privacy measures, but due to the fact that privacy laws will change the way we conduct business. Recent studies (ref 38) estimate that the impact of GDPR may result in serious decrease in profits from data analytics, loss of jobs related to this and lost operational profitability due to inaccurate targeting and consumers not opting for services. There are also concerns that (ref 39) larger firms with existing (consented) databases and means to implement DP measures may be favoured over entrants and small firms. While almost all of the proposed privacy guidelines seem to have a negative impact on economic value creation, this is only true if conducting business in a universe that is itself not changing. In fact, the guidelines are likely to increase profits in a universe where consumers are increasingly aware of and willing to use their empowerment. Let’s look at a few guidelines and how the odds of their profitability are changing.
38 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
• Data minimalisation The data minimalisation concept goes against the ‘Big Data’ hypothesis – the common belief that having more data leads to better results. Without going into the scientific discussion, if this is always the case (no free lunch theorem (ref 40), one can argue how good results are if the quality and origins of the data are ungoverned. As Dr. David Bray from Harvard University stated in interview with CXO talk (ref 41): “If you have a lot of data but it’s extremely biased or it’s missing what you really want to focus on as a business, then it’s not going to be practically relevant to what you’re trying to achieve.” He also motioned the often forgotten fact that we are using statistical methods developed decades ago and that when they are applied to large sets “you may find things that show up mathematically as appearing to be statically significant but, in the real world, might not actually be correlated whatsoever”. This is exactly the point at which the current universe is changing as we see a rise of development in so called “minimalist machine learning algorithms” like the one
developed at CAMERA (ref 42) that actually performs better using a very small set of data. And to go even further, the current analytics community is actually letting go of the idea of massive storage. The protection of large data sets is not only expensive, but also applying proper de-identification methods like pseudonymisation, or even deletion, will not reduce risk of re-identification in tremendously large-scale datasets. In the paper “Solving AI’s Privacy Problem” (ref 43 Data Science Institute), researchers give us insight into a true AI practitioner world, describing initiatives like the OPen ALgorithms (OPAL) project (ref 44). This initiative enables machine learning algorithms to be submitted and trained on pseudonymised data sets, while insuring the full anonymity of individuals. So, while the hype calls data minimalization contradictive and limiting in regards to analytics, development in the field shows otherwise. It is actually used to improve efficiency – time, money and impact wise.
• Consent Studies estimate that almost 60% of consumers would not be inclined to opt-in for marketing purposes and that this may lead to shrinking of databases by 39%. However, 82% of respondents believe that data obtained by explicit consent will be more valuable. The explicit consent may hurt small and new firms more than established ones – as people tend to trust ‘the known’ more. But the same study (ref 39) also stated that the competitive aspect is also heavily influenced by consumers’ concerns about privacy. And as we are entering the galaxy ruled by Generation Z, born in the mid-1990s and beyond, explicit consent is almost a prerequisite for doing business. According to recent global surveys, conducted by IBM Institute for Business Value (ref 45, ref 46), more than 55% of these the cyber-savvy young people prefer firms that
allow them to choose when, how and what they wish to share and be contacted about. They also favour the approach of active engagement and if provided it, their willingness to share data increases.
• Purpose limitation The purpose limitation – as the example specified in Article 5.1(b) of GDPR – states “that personal data shall be collected for specified, explicit and legitimate purposes and further processing in a manner that is incompatible with these purposes is prohibited”. This is often described as the non-ability to do any further analysis with the data, especially because nowadays the innovative secondary uses of data cannot be predicted upfront. Often forgotten is the fact that further processing is allowed if it is conducted in compatible manner. Article 6.4 lists several possible factors to determine and achieve compatibility, varying from context to the existence of appropriate safeguards. The German Government’s Digital Summit provided a paper (ref 47) with a detailed description of how pseudonymisation – as an appropriate safeguard – can be used to facilitate further data analysis and processing fully compliant with GDPR.
• Right to erasure/right to be forgotten The right to erasure – often referred to as the right to be forgotten – as stated in GDPR article 17 gives data subjects the possibility of having their personal data deleted if they don’t want them processed any more and when there is no legitimate reason to keep them. However, this right is not absolute as there are many limitations to it (as stated in article 65 and 73). One could argue that in this case only personal data based on consent are ‘at limited stake’ when this right is being executed. But this does get much more complex in case of analytics.
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 39
WHEN THE RIGHT TO ERASURE DOES NOT APPLY
WHEN PERSONAL DATA PROCESSING IS NECESSARY FOR
Right of freedom of expression and information
The right to erasure is not an absolute or unconditional right. Data controllers need to take into account the aspects of possibility, proportion, costs, overriding legitimate grounds and more.
Compliance with another legal obligation
Official authority of controller
Public interest
Obligation controller EU law
Task carried out in the public interest
Obligation controller national law
Public interest in the area of public health Archiving purposes in the public interest
Establishment legal claim
In some cases (e.g. children, unlawful processing and the withdrawal of consent and certainly explicit consent) there is more at stake than in others.
Exercise legal claim
Defence legal claim
Scientific, historical research & statistical purposes
In WP29 Guidelines on Automated individual decisionmaking and Profiling for the purposes of Regulation 2016/679 (the GDPR), right to erasure does extend to input and output data as well, meaning that if profiling is based on consent and consent gets withdrawn, the controller must erase the relevant personal data. If a data subject is in legal dispute or wants to challenge an unfair decision after exercising the right of erasure (because for example at the time of erasure they were unaware of the decision making), these two rights may even contradict. 40 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
Legal claims
Instead of diving into the legal complications of this, there is a much more elegant solution called ‘Controlled Linkability’ or Anonosising® data as proposed by Mike Hintze and Gary LaFever (ref 52). They state that “the same data can represent one level of de-identified data to one entity, and another level to another entity – depending on who controls keys necessary to re-identify the data. For the entity holding re-identification keys, the data may represent one level of de-identification, but for an entity that does
not control or have access to the keys, the data may represent a higher level of de-identification. Exclusions under GDPR Articles 11(2) and 12(2) for controllers “not in a position to identify data subjects” may apply to controllers who do not have access to those keys, but they may not apply to controllers with access to both data and the keys necessary to re-identify it.” So, if a data controller applies pseudonymisation and gives the key to the data subject and/or legal authorities before deleting their de-identification key, then the rights of the data subject and GDPR compliance are truly in place – the data controller cannot identify the data subject, while the data subject can execute all their rights and any other legal grounds can be answered with proper jurisdiction. In addition, this would make personal data in back-up and archive also non-identifiable.
• Algorithmic transparency The GDPR requires organisations to provide “meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject”. This transparency requirement is applicable for human and automated decision making as well. The critics argue that, in cases of processing when AI is involved, this is almost impossible. After all, the algorithms used are technically complex and in some cases not even explainable (as even the makers behind artificial intelligence cannot discern how a machine has learned). Some even go so far as to question if AI is even going to be legal in the EU after May 2018. And while the ‘catastrophic’ impact is being overwhelmingly illuminated, the scientific community has developed several methods to address this. Not only are the mathematical methods behind AI being rigorously strengthened, also the interpretation engines and techniques are being developed to demystify the notion of the black boxes. Most known are LIME
(Local Interpretable Model-Agnostic Explanations, ref 48) which explain which data have impacted the results most and XAI (DARPA’s Explainable AI, ref 49) a collection of techniques that produce explainable models while maintaining the results. Instead of trying to explain the complexity of logic, it seems more valuable to provide information that actually empowers data subjects to act – and this act may not only be to opt-out from processing. Binns, VanKleek, Veale, Lyngs, Zhao and Shadbolt (ref 48) presented a paper in which they researched the effect of different explanation styles on people’s perceptions of algorithmns. Especially interesting is the sensitive approach which shows how an outcome can be changed by an individual’s actions, for example how a driver can change their own outcomes – “If 10% or less of your driving took place at night, you would qualify for the cheapest tier”). Presenting consequences in a way that empowers a data subject to take action other than opt-out seems a much better way to get true engagement.
• Territorial scope Different countries have fundamentally different laws regarding privacy. In the EU privacy is considered a human right, while U.S. privacy laws have a more sectoral approach (health, finance, marketing etc.). GDPR is therefore applicable for U.S. companies only when they are processing EU citizens’ personal data. One may argue that US companies who do not offer goods or a service via a website that collects the personal data of EU citizens, do not have to worry about the implications of GDPR, while one that does, will simply have to separate the ‘two lines’ of business. Both approaches will likely result in a loss of value – either due to the extra costs associated with maintaining one or two closed systems, or due to missing relevant insights. Consumers’ views on privacy are changing, so even without regulations like GDPR, a Privacy by Design approach may be a differentiating competitive factor. THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 41
Conclusion “There is a theory which states that if ever anyone discovers exactly what the Universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable. There is another theory which states that this has already happened.” Douglas Adams, The Restaurant at the End of the Universe
As every universe, privacy is constantly changing. Hopefully this Guide can navigate you through the Galaxy of Privacy by Design safely. To meet all those whose insights made this possible go to The Restaurant at the End of the Universe.
42 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
THE RESTAURANT AT THE END OF THE UNIVERSE 1 Boston Consulting Group, November 2012, “The Value of Our Digital Identity”, http://www.libertyglobal.com/PDF/public-policy/ The-Value-of-Our-Digital-Identity.pdf 2 EU 2014 Communication “Towards a thriving data-driven economy”, https://ec.europa.eu/digital-single-market/en/news/communicationdata-driven-economy 3 https://linc.cnil.fr/fr/reglement-europeen-protection-donnees/dataviz 4 Hadar, Irit and Hasson, Tomer and Ayalon, Oshrat and Toch, Eran and Birnhack, Michael and Sherman, Sofia and Balissa, Arod, Privacy by Designers: Software Developers’ Privacy Mindset (March 24, 2014). 23(1) Empirical Software Engineering 259-289 (2018). Available at SSRN: https:// ssrn.com/abstract=2413498 or http://dx.doi.org/10.2139/ssrn.2413498 5 “The Privacy Engineer’s Manifesto Getting from Policy to Code to QA to Value”, Authors Michelle Finneran Dennedy, Jonathan Fox, Thomas R. Finneran, DOI https://doi.org/10.1007/978-1-4302-6356-2 6 Hildebrandt, Mireille, Privacy As Protection of the Incomputable Self: From Agnostic to Agonistic Machine Learning (December 3, 2017). Available at SSRN: https://ssrn.com/abstract=3081776 or http://dx.doi.org/10.2139/ssrn.3081776 7 Hildebrandt, M. & B.J. Koops (2010), ‘The Challenges of Ambient Law and Legal Protection in the Profiling Era’, 73 Modern Law Review, DOI: 10.1111/j.1468-2230.2010.00806.x · Source: OAI 8 T.Z. Zarsky, Thinking outside the box: considering transparency, anonymity, and pseudonymity as overall solutions to the problem in information privacy in the internet society, University of Miami Law Review, 58 (2003) 1028-1032. 9 Ref 7 10 Quote Dr. Ann Cavoukian transcript https://blog.varonis.com/ interview-privacy-expert-dr-ann-cavoukian/ 11 Mattsson, Ulf T., A New Scalable Approach to Data Tokenization (June 19, 2010). Available at SSRN: https://ssrn.com/ abstract=1627284 or http://dx.doi.org/10.2139/ssrn.1627284 12 Pete Warden, O’Reilly Media’s blog “Why You Can’t Really Anonymize Your Data” (2011), https://www.oreilly.com/ideas/anonymize-data-limits 13 EU PIAF A Privacy Impact Assessment Framework for data protection and privacy rights, (2011), JLS/2009-2010/DAP/AG, www.vub.ac.be/LSTS/pub/Dehert/507.pdf 14 M. Ryan Calo, The Boundaries of Privacy Harm 86 Ind. L.J.1131, 1133 (2011)
15 H. Nissenbaum, Privacy in Context: Technology, Policy and the Integrity of Social Life (Palo Alto: Stanford University Press, 2010) 16 Solove, Daniel J., A Taxonomy of Privacy. University of Pennsylvania Law Review, Vol. 154, No. 3, p. 477, January 2006; GWU Law School Public Law Research Paper No. 129. Available at SSRN: https://ssrn. com/abstract=667622 17 NISTIR 8062 An Introduction to Privacy Engineering and Risk Management in Federal Systems by NIST (2017), https://csrc.nist. gov/publications/detail/nistir/8062/final 18 CNIL Methodology for Privacy Risk Management (2012), https://www.cnil.fr/sites/default/files/typo/document/CNILManagingPrivacyRisks-Methodology.pdf 19 LINDDUN Privacy threat modelling, https://linddun.org/ 20 Fritsch, L. (2007), “State of the art of Privacy enhancing Technology (PET)”. Norwegian Computing Center Report, No. 1013. Available at: http://publ.nr.no/4589 21 London Economics, Study on the economic benefits of privacyenhancing technologies (PETs), (2010), https://londoneconomics. co.uk/blog/publication/study-on-the-economic-benefits-of-privacyenhancing-technologies-pets/ 22 Koorn et al., Privacy Enhancing Technologies – Witboek voor beslissers; [R. Koorn, H. van Gils, J. ter Hart, P. Overbeek, P. Tellegen, J. Borking]; Ministry of internal affairs and Kingdom relations; The Hague, 2004 23 Privacy Enhancing Technologies – A Review of Tools and Techniques, The Technology Analysis Division of the Office of the Privacy Commissioner (2017), https://www.priv.gc.ca/en/opc-actionsand-decisions/research/explore-privacy-research/2017/pet_201711/ 24 Stanford Cyberlaw PET wiki, https://cyberlaw.stanford.edu/wiki/ index.php/PET 24 ENISA “Privacy by Design in Big data” (2015), https://www.enisa. europa.eu/publications/big-data-protection 26 ISO/IEC 29100:2011, https://www.iso.org/standard/45123.html 27 ISO/IEC 27550 Privacy engineering, https://www.iso.org/ standard/72024.html 28 ISACA Privacy Principles and Program Management Guide, http://www. isaca.org/Knowledge-Center/Research/ResearchDeliverables/Pages/ ISACA-Privacy-Principles-and-Program-Management-Guide.aspx
THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN | 43
29 IEEE P7002™ Data Privacy Process, https://standards.ieee.org/ develop/project/7002.html 30 Dr. Ann Cavoukian, 2012, “Operationalizing Privacy by Design”, http://www.cil.cnrs.fr/CIL/IMG/pdf/operationalizing-pbd-guide.pdf 31 Yong-Sang Cho, Tore Hoel, Weiqin Chen (2016), Mapping a Privacy Framework to a Reference Model of Learning Analytics, http://www. laceproject.eu/wp-content/uploads/2015/12/ep4la2016_paper_4.pdf 32 Prinsloo, P. and Slade, S. (2017). Ethics and Learning Analytics: Charting the (Un)Charted. In Lang, C., Siemens, G., Wise, A. F., and Gaevic, D., editors, The Handbook of Learning Analytics, pages 49–57. Society for Learning Analytics Research (SoLAR), Alberta, Canada, 1 edition 33 OASIS Privacy Management Reference Model (PMRM) TC, https:// www.oasis-open.org/committees/tc_home.php?wg_abbrev=pmrm 34 OASIS Privacy by Design Documentation for Software Engineers (PbD-SE) TC, https://www.oasis-open.org/committees/tc_home. php?wg_abbrev=pbd-se 35 MITRE Privacy Engineering Framework 2014, https://www.mitre.org/ publications/technical-papers/privacy-engineering-framework 36 PRIPARE, Preparing Industry to Privacy-by-design by supporting its Application in Research, http://pripareproject.eu/ 37 J.-H. Hoepman: Privacy design strategies, eprint arXiv:1210.6621 (October 2012) 38 London economics, “Analysis of the potential economic impact of GDPR – October 2017”, https://londoneconomics.co.uk/blog/ publication/analysis-potential-economic-impact-gdpr-october-2017/ 39 Campbell, James David and Goldfarb, Avi and Tucker, Catherine E., Privacy Regulation and Market Structure (August 15, 2013). Available at SSRN: https://ssrn.com/abstract=1729405 or http://dx.doi. org/10.2139/ssrn.1729405 40 Gómez, David & Rojas, Alfonso. (2015). An Empirical Overview of the No Free Lunch Theorem and Its Effect on Real-World Machine Learning Classification. Neural computation. 28. 1-13. 10.1162/ NECO_a_00793. 41 Dr. Anthony Scriffignano, Chief Data Scientist at Dun & Bradstreet, and Dr. David Bray, Executive Director at People-Centered Internet, CXOTalk episode 270, “Data, AI, and Algorithms: New Year’s Resolutions for 2018” January 2018, https://www.cxotalk.com/ episode/data-ai-algorithms-new-years-resolutions-2018 42 DOE/Lawrence Berkeley National Laboratory. “’Minimalist machine learning’ algorithms analyze images from very little data: CAMERA
44 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
researchers develop highly efficient neural networks for analysing experimental scientific images from limited training data.” ScienceDaily. ScienceDaily, 21 February 2018. <www.sciencedaily. com/releases/2018/02/180221122909.htm>. 43 Yves-Alexandre de Montjoye, Ali Farzanehfar, Julien Hendrickx and Luc Rocher, « Solving Artificial Intelligence’s Privacy Problem », Field Actions Science Reports [Online], Special Issue 17 | 2017, Online since 31 December 2017, connection on 05 March 2018. http:// journals.openedition.org/factsreports/4494 44 Open Algorithms (2017), OPAL, http://www.opalproject.org/ 45 “Uniquely Generation Z: What brands should know about today’s youngest consumers.” IBM Institute for Business Value. January 2017. https://www-935.ibm.com/ services/us/gbs/ thoughtleadership/uniquelygenz/ 46 “Gen Z brand relationships Authenticity matters.” IBM Institute for Business Value 2017. http://www-935.ibm.com/services/us/gbs/ thoughtleadership/genzbrand/ 47 The German Government’s Digital Summit, White Paper on Pseudonymisation Drafted by the Data Protection Focus Group 2017, https://www.eprivacy.eu/fileadmin/Redakteur/News/2017_ Data_Protection_Focus_Group-White_Paper_Pseudonymisation.pdf 48 Introduction to Local Interpretable Model-Agnostic Explanations (LIME). A technique to explain the predictions of any machine learning classifier. By Marco Tulio RibeiroSameer Singh, Carlos Guestrin August 12, 2016, https://www.oreilly.com/learning/ introduction-to-local-interpretable-model-agnostic-explanations-lime 49 https://www.darpa.mil/program/explainable-artificial-intelligence 50 Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao and Nigel Shadbolt (2018) ‘It’s Reducing a Human Being to a Percentage’; Perceptions of Justice in Algorithmic Decisions. ACM Conference on Human Factors in Computing Systems (CHI’18), April 21–26, Montreal, Canada. doi: 10.1145/3173574.317395 51 J. Indumathi (2012). A Generic Scaffold Housing the Innovative Modus Operandi for Selection of the Superlative Anonymisation Technique for Optimized Privacy Preserving Data Mining, Data Mining Applications in Engineering and Medicine, Associate Prof. Adem Karahoca (Ed.), InTech, DOI: 10.5772/49982. Available from: https://www.intechopen.com/books/data-mining-applications-inengineering-and-medicine/a-generic-scaffold-housing-the-innovativemodus-operandi-for-selection-of-the-superlative-anonymisat 52 Hintze, Mike and LaFever, Gary, Meeting Upcoming GDPR Requirements While Maximizing the Full Value of Data Analytics (January 2017). Available at SSRN: https://ssrn.com/ abstract=2927540 or http://dx.doi.org/10.2139/ssrn.2927540
“What do I mean by who am I?” Douglas Adams, The Hitchhiker’s Guide to the Galaxy
PUBLISHED BY Protegrity: Proven Experts in Data Security Protegrity is the only enterprise data security software platform that combines machine learning, data discovery and classification tools, with scalable, data-centric encryption and pseudonymisation technologies, to help businesses secure sensitive information everywhere while maintaining its usability. Built for complex, heterogeneous business environments, Protegrity provides unprecedented levels of data security for applications, data warehouses, mainframes, big data and the cloud with the industry’s first all-in subscription solution. Companies trust Protegrity to help them identify, locate and protect sensitive data by design and by default, enterprise wide, to reduce risk, manage privacy, achieve compliance, enable business analytics and confidently adopt new platforms. For additional information visit www.protegrity.com
www.protegrity.com
“Arthur blinked at the screens and felt he was missing something important. Suddenly he realised what it was.” Douglas Adams, The Hitchhiker’s Guide to the Galaxy
Like Arthur Dent in The Hitchhiker’s Guide to the Galaxy, organisations around the world are suddenly realising they have been missing something important: that people care about their privacy. As digital citizens we have no choice but to trust the educators, healthcare providers, government agencies, financial institutions, retailers, social media platforms, tech firms and communications companies we engage with daily via mobile apps, websites and connected devices. The reality is that those we entrust with our personal information are the only ones who can truly safeguard our privacy. In this context, the exponential growth in personal data and the analysis of it has led to increasingly rigourous legislation that has globally heightened a sense of organisational responsibility. Europeans have taken the lead here with their General Data Protection Regulation but enterprises around the world are concerned about honouring their responsibilities as custodians of our personal information, so it is my very great pleasure to present “the answer to the great question”: The Hitchhiker’s Guide to Privacy by Design, by Barbara Peruskovic. Suni Munshani, Protegrity CEO
Published by Protegrity 2018 46 | THE HITCHHIKER’S GUIDE TO PRIVACY BY DESIGN
T ’ N DO IC N A P