Anonymization of personal data by Bundesverband der Deutschen Industrie e.V.

HANDBOOK | LAW | DATA PROTECTION LAW

Anonymization of personal data A cross-sector practical guide for industrial companies

Digital version Simply scan the QR code with your smartphone or tablet to open the digital version. english.bdi.eu/publication/news/Anonymization-of-personal-data

Handbook | Law | Data Protection Law Anonymization of personal data

Table of Contents

Table of Contents Preface .................................................................................................................................................................................... 5 1. Introduction ..................................................................................................................................................................... 6 2. An overview of the “anonymization” of personal data ............................................................................... 8 3. Terms ................................................................................................................................................................................. 9 3.1 Personal data ............................................................................................................................................................ 9 3.2 Processing of personal data ............................................................................................................................... 9 3.3 Diversity of the term “anonymization” .......................................................................................................... 10 3.3.1 De facto anonymization ........................................................................................................................ 10 3.3.2 Absolute anonymization ....................................................................................................................... 10 3.3.3 Formal anonymization ........................................................................................................................... 11 3.3.4 State of the art ......................................................................................................................................... 11 3.4 De-identification .................................................................................................................................................... 12 3.5 Re-identification .................................................................................................................................................... 12 3.6 Pseudonymization and delimitation from de facto anonymization ..................................................... 12 4. Legal consequences of effective anonymization ........................................................................................ 13 5. Effective anonymization – legal framework ................................................................................................... 14 5.1 GDPR requirements ............................................................................................................................................. 14 5.2 Objective factors for assessing sufficient de facto anonymization .................................................... 14 5.3 Possible levels of de-identification measures ............................................................................................ 17 5.4 Erasure of the original data set ........................................................................................................................ 18 6. Technical requirements for effective de facto anonymization .............................................................. 20 6.1 Overview of de-identification techniques .................................................................................................... 20 6.1.1 Removal of identifiers ............................................................................................................................ 20 6.1.2 Randomization ......................................................................................................................................... 20 6.1.3 Generalization/aggregation ................................................................................................................. 23 6.2 Formalized anonymization criteria .................................................................................................................. 24 6.2.1 Differential privacy .................................................................................................................................. 24 6.2.2 k-anonymity .............................................................................................................................................. 25 6.2.3 l-diversity and t-closeness .................................................................................................................. 25 6.3 Effectiveness of anonymization ....................................................................................................................... 26 6.4 Selection of the anonymization method ....................................................................................................... 27 6.4.1 What kind of data sets is involved? ................................................................................................. 27 3

Handbook | Law | Data Protection Law Anonymization of personal data

Table of Contents

6.4.2 For which use case is the data being anonymized? .................................................................. 28 6.4.3 Levels of anonymization ....................................................................................................................... 28 6.4.4 Review ........................................................................................................................................................ 28 6.5 Regular review of the anonymization method ............................................................................................ 28 7. Organizational implementation of the de facto anonymization by third parties .......................... 30 7.1 Organizational measures ................................................................................................................................... 30 7.2 Third-party liability under data protection law ........................................................................................... 30 7.2.1 Processor ................................................................................................................................................... 30 7.2.2 Joint control .............................................................................................................................................. 31 7.2.3 (Separate) controllers ............................................................................................................................ 31 7.3 Special features of de-identification within the group/a group of companies ................................ 31 8. Legality of de-identification measures ............................................................................................................ 32 8.1 Admissibility of de-identification measures in the event of a “change of purpose” ..................... 33 8.2 Obligations to review the legality of the (initial) collections ................................................................... 34 9. Further data protection requirements with regard to de facto anonymization ............................ 35 9.1 Information obligations according to Article 13, 14 GDPR .................................................................... 35 9.2 Documentation ...................................................................................................................................................... 36 9.3 Data protection impact assessment .............................................................................................................. 37 Imprint ................................................................................................................................................................................... 38

Handbook | Law | Data Protection Law Anonymization of personal data

Preface

Preface The importance of data for the digital transformation of German industry is increasingly becoming the focus of digital policy discussion. In its European strategy for data, the EU Commission aims at creating a single market for data based on European values and standards, which particularly addresses the high level of data protection in Europe. In order to turn the national and European data economy into a successful model solution with a high level of data protection, Europe must now find practical solutions for the use of anonymized data as quickly as possible. After all, anonymized data holds great potential for the economic value chain if they are tapped using statistical and analytical methods. This does not necessitate compromising the level of data protection. Many industrial companies are under constantly increasing competitive pressure, to optimize their data-based production and business processes. For the development of digital business models, they are dependent on the use of anonymized data. In the absence of a sufficiently differentiated legal framework and in view of the lack of technical standards, this is however difficult in practice. GDPR-compliant anonymization of personal data remains a risky undertaking in view of the probable inconsistent interpretation by data protection authorities and the severe fines in the event of a data breach. When in doubt, companies therefore often shy away from the efficient use of anonymized data. Concerns about the violation of data protection regulations should no longer prevent companies from developing digital business models. Europe must significantly increase the pace of legally compliant handling of data if it seriously wants to make up for the digitalization gap and the required data usage. This cross-sector guide is intended to clarify some fundamental questions for companies. Best practice examples from industry will provide guidance on how to anonymize personal data in the most legally secure way. We would be delighted if this guide can make a contribution to the better use of anonymized data in practice.

Iris Plöger

Dr Bertram Burtscher

Member of the executive board Federation of German Industries

Partner, head of the TMT Sector Group (Vienna) Freshfields Bruckhaus Deringer LLP

Introduction

The data protection requirements of the GDPR and the German Federal Data Protection Act (BDSG) apply to “personal data,” i.e. information that relates directly or indirectly to natural persons. The processing of personal data is only permitted if there is a legal basis. In addition, the data protection regulations provide for further obligations of the data controller, which should, for example, ensure the principles of lawfulness, fairness and transparency, purpose limitation, data minimization, integrity and accountability. Therefore, not all analyses or other use of personal data that are useful or necessary for digital innovation or other economic applications are permitted under data protection law. On closer inspection, however, it is not necessary to use personal data in many cases. If the data is “valuable” for the controller or third parties even without information relating to identified or identifiable natural persons, anonymization provides the option of using this data without being subject to the strict data protection regulations. One major challenge is that there are hardly any specific guidelines that explain the circumstances under which data will be considered anonymous in the time of big data. In particular, the term “anonymization” is used very inconsistently in practice. For example, replacing initial letters of names in

court decisions has been referred to as “anonymization,” although from a legal point of view this is at best a (weak) pseudonymization.1 This guide is dedicated to companies in the private sector and is intended to provide an overview of the data protection law framework for possible anonymization measures. For public administration and in particular law enforcement authorities, additional requirements may apply that are not the subject of this guide. Sector-specific particularities (e.g. for financial institutions, telecommunications services or the health, pharmaceutical or automotive industries, etc.) are not discussed in detail in this guide, as further framework conditions may have to be taken into account. This guide cannot replace legal advice in specific cases, nor does it cover all issues in this context. This guide is also expressly not a technical guide for the concrete implementation of anonymization measures. The editors and authors are not liable for the content of this guide. In an increasingly digitized society, data is of immense importance. The data strategy of the European Commission of February 2020 also sees enormous advantages of big data for European citizens, especially in

Cf. for example Berlin Administrative Court, ruling of 27 February 2020 – 27 L 43/20, ZD 2020, 324.

Handbook | Law | Data Protection Law Anonymization of personal data

Introduction

the area of mobility. The creation of a single market for data, data should be able to be shared for the benefit of companies, researchers and public administration2. However, this change presents companies with growing challenges: Driving digital innovation while at the same time complying with the complex requirements for the protection of personal data remains a challenge for commercial companies in all sectors even two years after the General Data Protection Regulation (GDPR) came into effect, supplemented by the provisions of the German Federal Data Protection Act (BDSG), not least because of the lack of binding guidance in the practical implementation of the sometimes new and rigorously sanctioned legal requirements.

It provides quick access to the concept of “anonymization” to decision-makers and specialists in the corporate areas entrusted with data protection issues such as IT, law, compliance and data protection and, taking into account specific solution strategies from a broad spectrum of industry, seeks to outline a path towards best practices in order to achieve compliance with the legal requirements for effective anonymization. The data protection requirements of the GDPR and the BDSG apply to “personal data,” i.e. information that relates directly or indirectly to natural persons. The processing of personal data is only permitted if there is a legal justification. In addition, the data protection regulations provide for further obligations of the data controller, which should, for example, ensure the principles of lawfulness and transparency, purpose limitation, data minimization, integrity and accountability. Therefore, not all analyses or other uses of personal data that are useful or necessary for digital innovation or other economic applications are permitted under data protection law.

Against this background, there is an enormous interest in the industry in anonymizing data in order to effectively exclude the scope of the data protection regulations and thus to benefit from data-based value creation. This guide supports companies in managing this task.

European data strategy – The EU as a model for a digital society, available at https://ec.europa.eu/info/strategy/ priorities-2019-2024/ europe-fit-digital-age/european-data-strategy_de.

An overview of the “anonymization” of personal data

The regulations of data protection law in the European Union (EU) and in Germany place certain requirements on the processing of personal data. In particular, the processing of personal data is subject to strict regulations, i.e. a legal basis is always required for any form of processing (cf. Article 6 and 9 GDPR). In addition, there are further data protection obligations, such as the obligations to fulfill information and data subject rights (cf. Article 12 to 23 GDPR) and to protect personal data by implementing appropriate technical and organizational measures (cf. Article 32 GDPR). Data that has no personal reference (for the concept of personal data, cf. 3.1) are excluded from the substantive scope of the GDPR (cf. Article 2 (1) GDPR). If personal data is anonymized in such way that it “loses” its personal reference, the data protection regulations are no longer applicable (cf. Recital 26 GDPR). Anonymization means nothing more than the (step-by-step) removal

of the personal reference – this process is described in this guide as “de-identification” of data (see 3.4) – until a sufficient degree of de-identification is available (de facto anonymization, see 3.3.1). Anonymization within the meaning of the GDPR must be distinguished from other forms of reducing the personal reference or making it difficult to re-identify, which, however, alone do not lead to a sufficient removal of the personal reference. These measures include, in particular, “pseudonymization.” In the case of pseudonymization, information can still be attributed to a natural person by using certain additional information and, thus, there is still a personal reference, so that the data protection regulations still apply (see 3.6 for the delimitation).

Terms In legal and technical literature, the terms relating to anonymization are used inconsistently, which increases legal uncertainty with regard to GDPR-compliant anonymization. The most important terms are defined below and hereinafter only used in this sense in the context of this guide.

3.1 Personal data Personal data means any information relating to an identified or identifiable natural person (cf. Article 4 no. 1 GDPR). What is crucial for the identifiability is whether the information can be attributed to a natural person directly or indirectly, e.g. by reference to an identifier such as a name, number, location or other attributes. If, after a certain identifier (e.g. a name) has been removed, data can be attributed to a natural person by reference to further identifiers (e.g. a job title and company, if this position only exists once in the company) or by consulting other additional information (e.g. an IP address and information on the identity of the user behind by the provider), this data continues to be personal. Note: In many cases, despite certain individual attributes, individual data cannot be clearly attributed to a data subject at first glance, but can be attributed to a certain group of people and a natural person can be identified due to the insufficient size of the group (for example, a re-identification can be carried out if a “female member of the executive board” of a group is mentioned and there are only one or two women on this executive board) or, when this data is combined with other (available) data, a specific person can be identified (e.g. for an employee: salary level in connection with starting date; for a patient: gender, zip code and a rare diagnosis). In these cases, there is still a personal reference due to the identifiability of a person.

3.2 Processing of personal data

Data protection law is linked to the “processing” of personal data. The term processing has a broad meaning and includes any operation that is “performed on” personal data (cf. Article 4 no. 2 clause 1 GDPR). In practice, there are hardly any conceivable operations in the handling of personal data that do not fall under this broad definition of processing. The (mere) collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction of personal data constitutes processing in each case (cf. Article 4 no. 2 clause 1 GDPR). Different opinions are expressed on the question of whether the anonymization of personal data, i.e. the withdrawal of personal reference, also constitutes data processing. Since the anonymization process influences the personal reference of a set of data – as with the erasure or pseudonymization of data (cf. Article 4 nos. 2, 5 GDPR) – it is argued that the process should be treated as processing within the meaning of the GDPR.3 Against this, it is noted that anonymization precisely does not involve a data protection-relevant process, as the anonymization process is privileged by the GDPR in itself and should therefore not be subject to the requirements of the GDPR4. In the context of the consultation process of the Federal Commissioner for Data Protection and Freedom of Information (BfDI), the BDI very clearly concurred with this opposing view5. Nevertheless, due to the broadly worded concept of processing and in the absence of case law of the highest courts, no conclusively reliable statement can be made at the moment as to how anonymization is to be classified in relation to the legal definition of processing within the meaning of the GDPR.

The Federal Commissioner for Data Protection and Freedom of Information, position paper on anonymization under the GDPR with special consideration of the telecommunications industry, valid as of: 29 June 2020, page 5.

The anonymization of the data is typically also in the interests of the data subject or at least does not run counter to these interests, Hornung/ Wagner, ZD 2020, 223 (224) with further references on the dispute.

BDI e.V., opinion on the BfDI consultation procedure “Anonymization of personal data” dated 23 March 2020, available at: https://www. bfdi.bund.de/DE/Infothek/Transparenz/Konsultationsverfahren/01_ Konsulation-Anonymisierung-TK/Stellungnahmen/BDI. pdf?_blob=publicationFile&v=1. 9

Handbook | Law | Data Protection Law Anonymization of personal data

Terms

Following the opinion of the BfDI and treating anonymization as a form of processing within the meaning of Article 4 no. 2 GDPR6, all requirements of the GDPR must be complied with for the anonymization process as well. Against this background, this guide also contains explanations at various points of possible legal bases and secondary obligations associated with processing operations.

3.3.2 Absolute anonymization

3.3 Diversity of the term “anonymization” In practice, various designations for the term “anonymization” are used alternatively and sometimes synonymously, some of which are based on different degrees of de-identification of sets of data and are not always synonymous with GDPR-compliant, i.e. sufficient, de facto anonymization. In some cases, anonymization is used as a generic term for all reductions in personal reference. In the context of this guide, however, “anonymization” only includes de-identification measures that lead to a removal of the personal reference in the sense that the data protection regulations are no longer applicable. These are the methods of de facto anonymization described below. Absolute anonymization would also represent “GDPR-compliant” anonymization, but it is neither required by the GDPR nor can it be implemented in vast majority of cases.7

3.3.1 De facto anonymization De facto anonymization (sometimes also referred to as relative anonymization) describes de-identification operations by which so many identifiers are removed and further techniques (see 6.1) to reduce personal reference (e.g. randomization or generalization) are applied that re-identification with reasonable efforts in accordance with the current state of the art (see 3.3.4) is no longer possible and the personal reference is eliminated (see Chapter 5 for more details). For the term “reasonable efforts” see 5.2.

See in detail regarding the legal requirements for anonymization in section 5.1

Absolute anonymization exists if the de-identification leads to the complete loss of any personal reference and a re-identification is absolutely excluded for everyone also from the overall context, with the greatest possible effort, i.e. using all even theoretically conceivable techniques regardless of the state of the art and the probability of re-identification, the costs and the required effort and the duration and by means of all possible additional information. In most cases, however, due to the diverse (digitally available) data sources, today’s information technology with its increasingly easier linkability of data and the continuous increase in the available computing power, the absolute irreversible loss of any personal reference appears impossible. Especially personal data that leave their traces in the digital world in the form of a digital footprint caused by use of the internet or even a mobile phone can hardly be removed completely. At least if purely theoretically conceivable re-identification techniques are considered personal data can almost never be anonymized in practice in such a way that a restoration of the personal reference can be absolutely excluded (see also 5.1). Example: Absolute anonymization can be assumed, for example, when data from a very large number of customers from all population groups and various geographical regions (e.g. in a global survey) are aggregated. If customers are asked about their satisfaction with a product, where the answer option is only “thumbs up” or “thumbs down,” there is a reliable number of answers for these two options and the individual answers are completely destroyed after the evaluation (i.e. including all copies and any interim results), the survey result (for example 58% of customers worldwide voted with thumbs up) would no longer be traceable to an identified person under any circumstances.

Handbook | Law | Data Protection Law Anonymization of personal data

3.3.3 Formal anonymization The weakest form of de-identification is the removal or omission of directly identifying attributes from the set of data, such as the real name or a personalized email address. The other attributes (for example so-called quasi-identifying attributes such as a user ID, ID number, VIN, etc.) are in contrast retained. This technique is also known as formal anonymization and regularly does not lead to GDPR-compliant de facto anonymization. Rather, formal anonymization is often just a form of pseudonymization, since reference to the removed identifying feature still allows it to be attributed to a natural person. Formal anonymization is therefore generally considered inadequate and is limited, for example in the German Federal Statistics Act (BStatG)8, to certain groups of recipients. Section 16 para. 6 no. 2 BStatG only grants access to formally anonymized individual information within specially secured areas of the Federal Statistical Office and the statistical offices of the states (Länder) if effective precautions are taken to maintain confidentiality. Authorized persons can only be public officials who are particularly sworn to public service. In order to at least make the re-identification on basis of so-called quasi-identifying attributes and the detection of identifying features more difficult, further measures are necessary.9 Example: As part of a customer satisfaction survey, the answers are tabulated with names. Only the column with the name is then deleted from the table. In these cases, it is possible that the information provided by the customers (e.g. information on name, address, etc. in a free text field) and further information in the table (e.g. customer number or product number) in combination with further information (e.g. overview of transactions with date and product number; the delivery

Law on Statistics for Federal Purposes (Federal Statistics Act – BStatG) in the version pronounced on 20 October 2016 (Federal Law Gazette I page 2394).

Jan Dennis Gumz, Mike Weber, Christian Welzel (Kompetenzzentrum für Öffentliche IT [Competence Center for Public IT]), Anonymisierung: Schutzziele und Techniken [Anonymization: protection goals and techniques], page 10, available at https://cdn0.scrvt.com/ fokus/784daae14fc72f91/ bcebf7142066/Anonymisierung---Schutzzieleund-Techniken.pdf.

Terms

address is a single-family house in which only one person lives) will permit the attribution of individual details to specific customers.

3.3.4 State of the art The state of the art within the meaning of the GDPR includes the recognized rules of technology that have already spread and been proven in practice. The term can be found not only in the GDPR, but also in national regulations such as Section 109 para. 1 of the German Telecommunications Act (TKG), Section 13 para. 7 of the German Telemedia Act (TMG), Section 8 a para. 1 sentence 2 of the German Act to Strengthen the Security of Federal Information Technology (BSIG). To determine the state of the art, recourse is taken to state-of-the-art approaches or the related best available techniques. The term “state of the art” is to be understood as dynamization within the GDPR, since the state of the art is constantly evolving. The obligation to take into account the state of the art is accordingly to be construed as an upper limit; the latest scientific trends and prototypes do not have to be taken into account as soon as they are discussed in the specialist community;, it rather depends on whether techniques become established in practice. This means that the controller or the processor cannot assume that the state of the art will remain static. It is therefore necessary to regularly review the anonymization method (see 6.4). Standards can also be utilized for a review in order to meet the own due diligence obligations by resorting to measures, techniques and processes that have already been assessed as appropriate by expert panels.10 The following standards related to anonymization are currently being developed or have already been published:11

DIN Standards Committee for Information Technology and Applications, opinion on the BfDI consultation procedure for anonymization under the GDPR with special consideration of the telecommunications industry, available at https://www.bfdi.bund.de/DE/Infothek/Transparenz/ Konsultationsverfahren/01_Konsultation-Anonymisierung-TK/ Positionspapier-Anonymisierung-GDPR-TKG.html;jsessionid=F53DF E83354223C285E7093AF6EC59B4.2_cid507?nn=5216976.

DIN Standards Committee for Information Technology and Applications, opinion on the BfDI consultation procedure for anonymization under the GDPR with special consideration of the telecommunications industry, available at https://www.bfdi.bund.de/DE/Infothek/Transparenz/ Konsultationsverfahren/01_Konsultation-Anonymisierung-TK/ Stellungnahmen/DIN.pdf;jsessionid=B5D45D1FD485BC3389BEFD8 56DE7F466.2_cid329?_blob=publicationFile&v=1. 11

Handbook | Law | Data Protection Law Anonymization of personal data

Example:

In a document, real names are replaced by numbers. There is a second separate document that contains a list of which number represents which name.

ISO/IEC 20889:2018 “Privacy enhancing data de-identification terminology and classification of techniques”; ISO/IEC 27555 “Guidelines on personally identifiable information deletion” (expected publication: December 2021).

3.4 De-identification De-identification basically describes the process at the end of which there is the reduction or ultimately even the removal of the personal reference of previously personal data. The result of a de-identification can also be a “mere” pseudonymization, i.e. if a re-establishment of the personal reference is still possible (albeit with a certain amount of effort). The more far-reaching the “de-identification” is the more likely it is that in the end a GDPR-compliant de facto anonymization can be assumed. Whenever methods or techniques of anonymization are mentioned in this guide, methods or techniques of de-identification are meant.

3.5 Re-identification In contrast to de-identification, re-identification stands for the restoration of the personal reference, i.e. the reversal of the process that led to the reduction or removal of the identifier. A re-identification is not possible with absolute anonymization, and with de facto anonymization only with a disproportionate amount of effort or after further development of the state of the art.

3.6 Pseudonymization and delimitation from de facto anonymization Pseudonymization refers to a process of de-identification, as a result of which data can only be attributed to a specific data subject with the use of additional available information (the original data set or a “key”) (cf. Article 4 no. 5 GDPR). The personal reference is therefore not irreversibly removed; rather, the data is only (temporarily) “decoupled” from the associated data subjects and the personal reference is limited, but the natural persons behind the information can still be identified (via the key).

Terms

Due to the possibility of re-identifying the data without great effort (e.g. by obtaining a re-identification key), pseudonymization is not a form of de facto anonymization. Rather, pseudonymization is a technical and organizational measure to protect personal data that prevents or at least restricts the unhindered attribution of the sets of data to data subjects, in particular by third parties (for example, by only granting a certain selected number of people access to a re-identification key) (cf. Article 32 para. 1 lit. a GDPR). Pseudonymization therefore includes de-identification measures that remove the direct personal reference in the set of data, but at the same time keep the possibility of subsequent attribution (for example, use of initials instead of names, etc.). Through the additional use of further anonymization techniques (see Chapter 6), but also additional organizational measures (e.g. restriction of access to the de-identification methods), pseudonymized data can be de facto anonymized. Whether a personal, anonymous or pseudonymous date is involved, depends greatly on the circumstances and framework conditions in the individual case. A date can therefore be classified differently by different data controllers. Example: A set of data that contains the IP addresses of device users is personal for the telecommunications service provider, as he can make the attribution. For other persons, however, the same data set is (at least) pseudonymized, since it cannot be attributed to specific persons without additional information.

Legal consequences of effective anonymization In case of sufficient de facto anonymization, the use of the anonymized data is excluded from the scope of the GDPR, as the data is no longer personal (see 5.1 below). However, other regulations outside of the data protection regulations that are not or only indirectly linked to a personal reference may continue to apply or may have to be observed before anonymization. For example, certain information is particularly protected, for instance because it is subject to postal or telecommunications secrecy. Certain Union and national regulations also provide for anonymization or must be observed in connection with the anonymization process. For example, the processing of traffic data within the meaning of Section 3 no. 30 TKG, i.e. data that is collected, processed or used during the provision of a telecommunications service, is governed by Sections 91 et seqq. TKG. Location data that is used in relation to the users of public telecommunications networks or publicly accessible telecommunications services may only be processed, to the extent necessary for the provision of value added services (e.g. location services) and within the period required for this, if they have been anonymized or if the subscriber has given their consent to the provider of the value added service (cf. Section 98 para. 1 sentence 1 TKG). Further relevant legal bases with regard to data protection requirements in electronic communication (ePrivacy) or special regulations can be found in Section 282 of the German Social Code (SGB) III; Sections 276, 277, 287 SGB V; Section 64 SGB VIII; Sections 67c, 75 SGB X; Sections 79, 84, 85, 92a, 98, 115, 117 SGB XI; Sections 8d, 9a, 12a, 14, 15, 15g of the German Transplantation Act (TPG); Section 150b of the German Trade Regulation Act (GewO); Section 467 of the German Code of Criminal Procedure (StPO); Sections 26, 29 of the German Federal Police Act (BPolG); Section 38 of the German Road Traffic Act (StVG); Section 32 of the German Implant Registry Act (IregG) and Section 13 Of the German Infection Protection Act (IfSG).

In addition, there are other protective regulations that can also affect the access or exclusivity of data, for example under copyright or trademark law, trade secret protection or antitrust law. Furthermore, contractual stipulations may have to be observed. Reference may also be made to Regulation (EU) 2018/1807 for the free movement of non-personal data. This regulation applies to the processing of electronic data that are not personal data if they are offered to users in the EU, regardless of where the service provider is domiciled, and to natural and legal persons in the EU who process this non-personal electronic data for their own needs. The regulation applies directly and makes various provisions in particular to remove obstacles to the cross-border traffic of non-personal data within the EU and to promote the data economy, such as a ban on data localization requirements. The presentation of these further legal restrictions is not the subject of this guide.

The de facto anonymization of personal data leads to the inapplicability of the GDPR, cf. Article 2 (1) GDPR and recital 26 GDPR. However, if “anonymization” is insufficient and the personal reference has not been at least de facto removed, the provisions of the GDPR are still applicable. If re-identification is possible at later time due to a further development of the technology with reasonable efforts, the degree of de-identification is no longer sufficient for de facto anonymization. From this point in time, there is (again) a personal reference, so that the provisions of the GDPR are again applicable. In such cases, in which the personal reference can only be restored with the passage of time (for example through further development of the state of the art), although it would also be conceivable to question the effectiveness of the anonymization from the beginning (retrospectively), such an approach would not be in accordance with the approach of “de facto anonymization” chosen in the GDPR, which specifically does not call for an absolute (permanent) removal of the personal reference.

Effective anonymization – legal framework

5.1 GDPR requirements The GDPR does not contain a clear definition of the term anonymization, but anonymization is mentioned in recital 26 (emphasis added): “To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments.” For the purposes of this provision, a “de facto” anonymization concept (hence the term “de facto anonymization”) is to be used as the basis for ascertaining effective anonymization, rather than an “absolute” one (for the term “absolute anonymization” see 3.3.2). When assessing whether a de-identification of personal data leads to an effective anonymization, i.e. in accordance with the requirements of the GDPR, it must also be taken into account, in addition to the “means” available to the controller under data protection law or to third parties (i.e. techniques, other options and further information), whether it is reasonably likely and with still reasonable effort in the individual case, that a specific person can be re-identified from a data set.12 That is to say, it is also decisive in the sense of a probability analysis, at the time of the performance of the de-identification up to the de facto anonymization, whether the controller can expect someone to make the effort to restore at least a certain personal reference.

The BfDI also does not consider absolute anonymization to be necessary under data protection law; it is sufficient to remove the personal reference in such a way that re-identification is virtually impossible, cf. the Federal Commissioner for Data Protection and Freedom of Information, position paper on anonymization under the GDPR with special consideration of the telecommunications industry, valid as of: 29 June 2020, page 4.

According to the current state of the art and the expected technical developments, personal data can in fact almost never be de-identified in such a way that a restoration of the personal reference can be ruled out using all means possible, even purely theoretically and in the future, and any – even immensely high – effort (see also the description of the term “absolute anonymization,” 3.3.2). Besides, it is not in the interests of the EU, the federal government, the private sector, employees and ultimately also the natural persons who will benefit from data-driven innovations if anonymization and thus the use of data is de facto excluded by placing excessive demands on the standard for anonymization. Absolute anonymization in such way that the restoration of the personal reference is not possible for anyone will therefore likely be impossible and is not actually required under data protection law.13 In conclusion, therefore, a de facto anonymization to the extent that is appropriate based on the nature of the data and the other objective factors must be regarded as sufficient to meet the requirements of the GDPR.

5.2 Objective factors for assessing sufficient de facto anonymization When assessing whether a sufficient degree of de facto anonymization has been achieved, recital 26 of the GDPR only provides starting points with regard to possible re-identification: All “objective factors” should be taken into account. These objective factors of a possible re-identification are in particular the state of the art (see 3.3.4), foreseeable technical developments (see 6.5), costs and the time/effort required for a re-identification and the probability resulting from this and from other circumstances that someone will make effort for the re-identification.

Handbook | Law | Data Protection Law Anonymization of personal data

Effective anonymization – legal framework

The standard for this evaluation is the potential “attacker.” An “attacker” here is a third party (i.e. neither the data controller nor the processor) who accesses the original data sets accidentally or intentionally.14

The factors of cost and time must each relate to the means and effort that a “re-identification attacker” would possibly invest in the context of an impermissible re-identification. Taken alone, however, these factors are not meaningful or sufficient to be able to assess whether de facto anonymization has been achieved in a specific case. Rather, the criteria, means, costs and effort of a possible re-identification at the respective assessment time can only be categorized or quantified if they are compared with other factors. For example, the time and costs involved in re-identification can be quite, but very low compared to the potential damage or gain that the re-identified data could bring. Correspondingly, the Article 29 Data Protection Working Party had stated in relation to the Data Protection Directive as predecessor of the GDPR that “the likelihood and severity of the impacts of a reversal need to be taken into account”.16 Therefore, other objective factors should also be taken into account which, according to general life experience, can make data appear “valuable” for potential “re-identification attackers.”

Some supervisory authorities15 have further specified the standards for reviewing adequate de facto anonymization: Irish Data Protection Commission “...if it can be shown that it is unlikely that a data subject will be identified given the circumstances of the individual case and the state of technology, the data can be considered anonymous.” “...the duty of organisations is to make all reasonable attempts to limit the risk that a person will be identified.”

Austrian data protection authority Anonymization must ensure that neither the controller nor a third party can restore a personal reference without disproportionate effort.

Spanish Agencia Española Proteccion Datos “... in order to anonymise a file, the corresponding data should be such as not to allow the data subject to be identified via ‘all,’ ‘likely’ and ‘reasonable’ means by the data controller or by any third party. Therefore anonymisation procedures must ensure that not even the data controller is capable of re-identifying the data holders in an anonymised file.”

Article 29 Working Party, WP 216: Opinion 5/2014 on Anonymization Techniques, page 16, available at https://datenschutz.hessen.de/sites/ datenschutz.hessen.de/files/wp216_de.pdf.

Austria, available at https://www.ris.bka.gv.at/ Dokumente/ Dsk/DSBT_20181205_DSB_D123_270_0009_DSB_2018_00/ DSBT_20181205_DSB_D123_270_0009_DSB_2018_00.pdf; Ireland, available at https://www.dataprotection.ie/sites/default/ files/uploads/2019-06/190614%20Anonymisation%20and%20 Pseudonymisation.pdf; Spain, available at https://edps.europa.eu/sites /edp/files/publication/19-10-30_aepd-edps_paper_hash_final_en.pdf.

Example: At the time of the assessment, it can be assumed that it would take a “re-identification attacker” at least two days to remove a de-identification, and the software required for this would cost around EUR 100. However, the data set contains tens of thousands of bank details that could be illegally sold at very high profits. In this case, the time and costs required are rather low in relation to the potential harm, and de-identification as a de facto anonymization is therefore not sufficiently secured.

Article 29 Working Party, WP 216: Opinion 5/2014 on Anonymization Techniques, page 10, available at https://datenschutz.hessen.de/sites/ datenschutz.hessen.de/files/wp216_de.pdf.

Handbook | Law | Data Protection Law Anonymization of personal data

Re-identification techniques Status quo and foreseeable development

Re-identification effort Cost and time

The term “sensitivity” of the type of data is not to be understood as congruent with special categories of personal data in accordance with Article 9 GDPR. Rather, “sensitivity” here refers to the possible impact on the data subjects (see 6.1.1). In the case of other information not mentioned in Article 9 GDPR, a re-identification can have drastic consequences for a data subject.

Effective anonymization – legal framework

“Value” of the raw data

. . ..

“Sensitivity” of the type of data concerned (according to the severity of the possible risk) “Sensitivity” of the data subjects (children, celebrities, politicians) High number of data subjects Complexity and high information content of the data sets

In addition, the question often arises in practice whether the costs of de-identification incurred by the controller can be taken into account. It is debatable whether the effort of re-identification can also be weighed against the effort of de-identification up to de facto anonymization. The Article 29 Data Protection Working Party stated in its opinion on anonymization under the Data Protection Directive:

Example: A company that sells pet food to end customers wants to de facto anonymize data sets for the analysis of the sales behavior of customers who only buy dog and cat food in their shopping cart. In addition to the type of pet food purchased in each transaction, the sets of data also contain EC and credit card information. While the information about what kind of pet food a person has bought, even if published, would most likely not lead to any damage to the material or immaterial legal interests of the data subjects, the EC and credit card information is “sensitive” from the point of view of the data subjects, as their disclosure can lead to great material harm. Because of this information, higher demands must be placed on the de facto anonymization.

To date, there are no reliable guidelines from supervisory authorities regarding the objective factors to be observed. Speaking generally, a broad quantification is hardly possible when a re-identification would require a disproportionately large amount of time, money and manpower and anonymization is a sufficient de facto anonymization. Therefore, the effectiveness of the de facto anonymization must always be assessed for the respective data set, taking into account all objective factors.

For instance, they [the controllers] should balance their anonymization effort and costs (in terms of both time and resources required) against the increasing low-cost availability of technical means to identify individuals in datasets, the increasing public availability of other datasets (such as those made available in connection with ‘Open data’ policies), and the many examples of incomplete anonymization entailing subsequent adverse, sometimes irreparable effects on data subjects.”17 This means that the economic interests in the least expensive de-identification must be weighed against the interests of the data subjects. Nevertheless, it is still decisive that the interests of the data subjects are (only) adequately protected if a degree of sufficient de facto anonymization has been achieved. When determining the likelihood of re-identification, information must be taken into account that may be available in addition to the anonymized data. Possible means of assistance that enable conclusions to be drawn about identifiers can, for example, be additional information accessible to a third party (such as potential “re-identification attackers”; for example, if the methodology is disclosed or external data is linked to the 17

Article 29 Working Party, WP 216: Opinion 5/2014 on Anonymization Techniques, page 10, available at https://datenschutz.hessen.de/sites/ datenschutz.hessen.de/files/wp216_de.pdf.

Handbook | Law | Data Protection Law Anonymization of personal data

Effective anonymization – legal framework

anonymized data in such a way that it yields re-identifying information).18 The anonymized data must therefore also be protected against unauthorized access by suitable security measures.

Before applying the possible techniques, it should be considered in each case whether the data are still suitable for the intended purpose after the respective de-identification measure(s) have been applied: A greater degree of de-identification is namely always associated with greater falsification and coarsening of the data. Depending on the planned utilization, data sets lose their robustness and informative value as the degree of de-identification increases. In some cases, data may even lose its meaningfulness entirely or allow incorrect conclusions to be drawn, which would render this data worthless.

5.3 Possible levels of de-identification measures In order to achieve the required degree of de-identification required for de facto anonymization, it is usually necessary to combine various de-identification techniques (see 6.4.3). Examples of a step-by-step approach: Data from users of a “video on demand service” are already collected for statistical analysis formally anonymized or at least heavily pseudonymized. In the next step, data is aggregated.

For example, the following “levels” (cf. graphic below) are possible for de-identification (whereas the first level is not a de-identification technique, but relates to the collection, and the levels must be respectively selected for the specific use case). For an explanation of the individual techniques, see 6.1 below.

Hence, in order to promote the informative value of the de-identified data, the intended use case should always be taken into account when assessing the required degree of de-identification. The selection of suitable techniques must consequently be decided for each individual case. Effective anonymization measures will regularly consist of a bundle of different technologies (see 6.1 below). It is therefore conceivable that certain techniques will render a data set unusable for a specific use case.

1. Level

2. Level

3. Level

4. Level

Restriction of the attributes to the necessary extent

Erasure of personal data

Randomization

Generalization/ aggregation

Jan Dennis Gumz, Mike Weber, Christian Welzel (Kompetenzzentrum für Öffentliche IT), Anonymisierung: Schutzziele und Techniken, page 10, available at https://cdn0.scrvt.com/fokus/784daae14fc72f91/ bcebf7142066/Anonymisierung---Schutzziele-und-Techniken.pdf.

Handbook | Law | Data Protection Law Anonymization of personal data

Example: An order application shows the customers and drivers the positions of the delivery route and the time. This data should be anonymized to optimize delivery routes and then evaluated. The data can be attributed to specific customers and specific drivers via the customer data, the start and end point and the delivery time. Even if the customer number is removed, a personal reference can still be established via time and route. If the time is removed, a personal reference can still be established via the start and end point. In addition, the time is relevant for optimizing the delivery route, but can be generalized into time segments (for example, morning, noon, afternoon, evening, night). If the position/route information is also removed, the data is likely to be sufficiently anonymized, but also unusable for the planned evaluation, since it no longer provides any information about the delivery route. If the route information is aggregated, it can be sufficiently anonymized, but with sufficient aggregation the route data loses its informative value. However, if the start and end point are generalized (by expanding the cluster, for example by urban area) and the data sets are enriched with “synthetic” data (e.g. data sets artificially added via different route planners), sufficient de facto anonymization may be achieved and the data sets are still usable for the intended use case.

Observance of the intended use case when assessing the anonymization method to be applied in individual cases also corresponds to the principle of “privacy by design.” Pursuant to this principle for the anonymization method, it is also the case that the personal data must be de-identified at the earliest possible point in time in the sense of data minimization and to protect data integrity (through more security), insofar as this is possible in accordance with the intended use case and other legal requirements for storage of personal data. It may moreover be necessary that data are only collected in a de-identified (i.e. usually pseudonymized) form, so that the personal reference is already given at the beginning of the planned processing at least only in a reduced form and (some) de-identification measures do not even have to be applied in the first place. After all, it is frequently unnecessary for the controller

Effective anonymization – legal framework

to be able to directly identify the data subjects at any point in time.19 Example: A delivery service wants to evaluate the efficiency of its deliveries in terms of time and fuel consumption. When analyzing these processes, a range of personal data can also be affected (employee data as well as customer data, for example address), even though these are not required for the intended efficiency review. In order to comply with the principle of data minimization at the time of collection, the controller should already provide a technical solution when collecting the data, so that the sets of data are only collected and stored in a de-identified form.

In this context, for instance, the European Data Protection Committee has recommended for the area of connected vehicles that data should be de-identified in the car before it is subsequently transmitted.20

5.4 Erasure of the original data set In many constellations it may happen that a controller uses the same set of data itself in a personal form (for example for the provision of services) and at the same time wants to de-identify it in another form in order to use the reduced personal data for another purpose (this is the case, for example, if the personal data is absolutely necessary for the provision of the customer’s services by the sales department and at the same time the research and development department would like to use the anonymized sets of data for further development). Admittedly, keeping both data sets at the same time increases the risk of re-identification. However, as long as the reduced personal set of data is sufficiently separated from the original set of data and cannot be associated with it, we believe that effective anonymization can still be achieved in individual cases without the original set of data having to be erased. In such case, the requirements are naturally higher, since access to

European Data Protection Board, Guidelines 4/2019 on Article 25 Data Protection by Design and by Default, p. 20.

European Data Protection Board, Guidelines 1/2020 on processing personal data in the context of connected vehicles and mobility related applications, p. 16.

Handbook | Law | Data Protection Law Anonymization of personal data

the original data set (or, for example, a hash key) may under certain circumstances be used to re-identify the de-identified data sets, so that no de facto anonymized data is indeed available. However, the risk of re-identification can be reduced (despite keeping the original data record) by implementing the organizational measures below.21 In addition to the various “general” measures with regard to the de facto anonymization (for example regular review of the de-identification process to determine whether (un)authorized third parties can carry out a re-identification; see point 6.2, last paragraph), the erasure of the original data record certainly contributes significantly to achieve a GDPR-compliant de facto anonymization. In our opinion, the simultaneous use of the same set of data in personal and de facto anonymized form is possible at least where it is ensured by accompanying measures that re-identification is not possible, taking into account reasonable means to be used by the controller (i.e. the respective departments or specialist areas within the same controller that should not have access to the personal data).

Effective anonymization – legal framework

Depending on the respective use case, the following measures (or a combination of these) can be taken:

The controller should ensure that the group of people with access to the original data and the group of people with access to the de-identified data do not overlap.

The controller can combine several de-identification techniques in such a way that for a “link” of personal sets of data and de-identified sets of data is technically significantly more difficult or practically impossible for it – regardless of other accompanying measures. Ideally, the anonymization is also carried out by a third party that does not inform the controller about the techniques used.22

With the help of de-identification by a third party, it can be ruled out that the controller knows the de-identification procedure and that re-engineering by the controller does not represent a “reasonably” utilized measure.

Additional technical and organizational measures (for example rights and role concepts, separate databases for personal and de-identified data sets, etc.) can ensure that simultaneous access to personal data and de-identified data is excluded and that there is also no “mixing up” of the data sets.

Regular reviews of the measures used must ensure that a “re-identification attacker” does not have access to the original data set and thus cannot link the data sets and thereby re-identify them.

It should be noted that an official or judicial clarification of the effectiveness of anonymization, if both the “anonymized” and the original data set are in the possession of the same controller, has not yet taken place.

See also section 7.

Technical requirements for effective de facto anonymization

The effective implementation of de facto anonymization (i.e. the fulfillment of certain formalized anonymity criteria) depends on the anonymization technique(s) used. The GDPR itself does not specify which anonymization techniques should be used. This section presents some of the common de-identification techniques and recommendations for checking their effectiveness.

6.1 Overview of de-identification techniques Many de-identification techniques exist that can be used to de-identify personal data. These meet – depending on the methodological approach and potential “re-identification attack model” – certain formalized anonymity criteria (for example: k-anonymity, l-diversity, t-closeness, differential privacy). Which de-identification technique or combination of these can guarantee sufficient de facto anonymization must always be assessed in light of the specific individual case at hand.

6.1.1 Removal of identifiers Personal data can consist of identifying attributes (i.e. name or identity card number), a quasi-identifying attribute (i.e. date of birth, place of residence or gender) as well as sensitive attributes (e.g. illnesses, sexual tendencies, very old age, etc.). In this context, the term “sensitive attribute” is not to be equated with special categories within the meaning of Article 9 (1) GDPR. One speaks of a sensitive attribute if the disclosure of the content and the attribution to a person justify a particular risk of potential or invasions of privacy (this also includes, for example, bank details, social security number or photographs).23 By removing the identifying and quasi-identifying attributes, data can be de-identified. In this case, individual or several identifying or quasi-identifying attributes (i.e. identifiers) are

completely deleted from a set of data, so that conclusions about an individual person are no longer possible, or at least this becomes very difficult. Yet, removing these identifiers is usually only the first step towards de facto anonymization. Example: The name of the user, the user and vehicle number are deleted from GPS location data generated by vehicles. In this way, the GPS location data can only be traced back to a single person under difficult conditions (and possibly only with corresponding additional knowledge).

6.1.2 Randomization Randomization/perturbation (i.e. a type of “disturbance”) refers to techniques (see a selection of individual such techniques below under 6.1.2.1 to 6.1.2.6) with which data values are replaced by artificially generated values in order to “alter” or “perturb” a data set in such a way that the direct link between certain data and the data subjects is removed. The data should only be altered to such an extent that at least statistical properties of the data set are retained for analysis purposes.

6.1.2.1 Data swapping In swapping, certain attributes of a data subject are artificially swapped for attributes of another person. Ideally, this happens randomly or pseudo-randomly,24 where it must be ensured that no data set ultimately reproduced itself. The technique can be improved if the variables of a specific person do not exactly match the variables of the other person.

24 23

In general on these concepts, Dietmar Hauf, page 8, available at: https:// dbis.ipd.kit.edu/img/content/SS07Hauf_kAnonym.pdf.

Pseudorandomness is a calculated randomness. This looks like a “real” randomness to the observer but can be reversed with knowledge of the key material.

Handbook | Law | Data Protection Law Anonymization of personal data

Example: In a customer list, the customer’s place of residence should be swapped. For example, if person A lives in place X and person B in place Y, then after swapping the “place” information in the database, it is mapped that person A lives in place Y and person B lives in place X. If, however, further elements were to be swapped between person A and person B, this could lead to the result that the set of data is largely self-reproduced and the purpose of the swap is therefore not achieved. Therefore, only a non-decisive part of the set of data should be swapped between two specific sets of data.

6.1.2.2 Cryptographic hash function A cryptographic hash function maps an output value – the so-called hash value – with a fixed length for some data or input value (of any length). A cryptographic hash function is a one-way function, so that no conclusions can be drawn about the original data from the hash value alone. In addition, a cryptographic hash function is collision-resistant, so that only one input value can ever be attributed to a hash value. This process is known as hashing. The cryptographic hash function itself is standardized and to that extent (generally) known. Therefore, the use of cryptographic hash functions does not enable automatic protection of decryption. A re-identification attacker who knows the stored hash value can calculate various input values using the known hash function until he receives a match with the stored hash value. Decryption therefore depends on the extent to which a re-identification attacker knows or can limit the type of possible input values (e.g. telephone numbers). To increase the difficulty of decryption, a random value is often added to an input value, which changes the

Technical requirements for effective de facto anonymization

hash value. If known, this random value is referred to as “salt.” If the random value is kept secret, it is called “pepper.” In order for the random value to offer the highest possible security against a re-identification attacker, it should be of sufficient complexity and length and kept as secret as possible. In addition, other de-identification techniques (such as stochastic overlay; see 6.1.2.3) or specific technical and organizational measures (such as access restrictions and restrictive rights and roles) are recommended. Example: In practice, hashing is used, for instance, to avoid having to save user passwords from online portals in clear text, i.e. unencrypted. Only the so-called unique hash value, i.e. the result of the cryptographic hash function applied to the password, is saved. If a password is entered, a (unique) hash value is also generated from the entry and, if the two hash values match, it is mathematically ensured that the password entered matches the password stored in the database. To prevent the hash values of simpler passwords from being determined by trial and error, a random value is usually added to the password before hashing (salt). Another widespread use of hashing is the de-identified storage of IP addresses, for which the same procedure can be used.

6.1.2.3 Stochastic overlay (“additive noise”) In the case of stochastic overlay, a random “measurement error” is deliberately added to the data, for example by overlaying random data (which are generated, for example, by adding random values to the existing values). This method can only be used on numeric values.

Handbook | Law | Data Protection Law Anonymization of personal data

Example: In the case of numerical values, for instance, the last digit is replaced by a random number (for example with GPS coordinates).

Technical requirements for effective de facto anonymization

level of protection against attacks, since the generated entries, which are created using random-based methods, no longer correspond to real persons. This is, however, also a disadvantage because flexibility in terms of analyses is lost. Example:

6.1.2.4 Synthetic data generation In this method, artificial data sets are created on the basis of a statistical model. The model is constructed with reference to statistical attributes of the original data, the synthetic data forming a subset of the original data. Samples are then taken from this in order to form a new data set. Example: From a dataset about burglaries in a certain region, only the statistical findings are extracted into a mathematical model, which now calculates other scenarios based on these statistical findings and possibly other added parameters.

6.1.2.5 Perturbation With perturbation, data values are replaced by artificial values. The aim is to change the data in such a way that statistical properties of the data set are nevertheless retained for analyses. The methods offer a high

A comprehensive data set contains, in addition to the classification “jobseeker, in training, self-employed, employed and retired,” the decade of birth (1950 to 1959, 1960 to 1969, 1970 to 1979, 1980 to 1989, 1990 to 1999, etc.) of the corresponding individuals. These values are replaced by randomly generated artificial information.

6.1.2.6 Permutation With permutation, data is shuffled between data sets within attributes. With this method, no values of the data set are altered, but the original data set is broken down into two parts (for example two tables) and linked via a group ID. This softens the association between the values from table 1 and the values from table 2.

Handbook | Law | Data Protection Law Anonymization of personal data

Example: In a data set of 30 patients, the table with the personal data is divided into the quasi-identifying attributes (age, gender and place of residence) on the one hand and the sensitive attributes on the other (course of the disease, symptoms). The split tables are still linked to one another via a group ID. 30 different sensitive values (table 2) are now possible for one entry in table 1 (quasi-identifying attributes). It is no longer possible to determine what course of disease and what symptoms the patients on the list have.

6.1.3 Generalization/aggregation Data can be de-identified by reducing their accuracy using various techniques (see a selection of individual such techniques below under 6.1.3.1 and 6.1.3.2) (for example, categorical values can be replaced by more general values based on a taxonomy, for instance the term “academic” replaces the designations judge, doctor or pharmacist). For numeric attributes, exact information is replaced by intervals (for example, the age 30 is replaced by the interval 30-35). This makes the data less specific and means that it can no longer easily be traced back to individual persons. However, if the

Technical requirements for effective de facto anonymization

number of sets of data is too small or the spread is too low, aggregation can still allow a personal reference.

6.1.3.1 Use of various generalization schemes Depending on the generalization approach, a distinction can be made between different schemes: In a so-called “full-domain generalization scheme,” all values of an attribute are generalized to the same level. If “doctor,” “judge” and “pharmacist” are replaced by “academic” in the above example, “electricians” and “painters” would also have to be generalized to “craftsmen.” In a so-called “subtree generalization scheme,” all so-called “child nodes”25 of a “parent node” are generalized. A so-called “sibling generalization scheme” is similar to the above, but here only specific child nodes of a parent node are generalized. For example, “doctor” can be replaced by “academic” without altering the designation “judge.” A so-called “cell generalization scheme,” on the other hand, allows the generalization of only selected individual values. For example, the value “judge” can be generalized in one entry and at the same time retained in another entry in the same

In graph theory, a node is an element of the node set of a graph. An edge indicates whether two nodes are related to one another or are connected to one another in the graphical representation of the node set, respectively. For a node that is different from the root, the node through which it is connected to an incoming edge is called the parent node. Conversely, all nodes that are connected from any node by an outgoing edge are called children, child nodes, or descendants (the gender-neutral names “parent” and “child” have largely displaced the older “father” and “son” terminology).

Handbook | Law | Data Protection Law Anonymization of personal data

Technical requirements for effective de facto anonymization

table. So-called “multidimensional generalization” considers several attributes at the same time and provides different generalization approaches for the respective attributes. For example, the group “Doctor, 32” can be replaced by “Doctor, (30-40),” whereas all entries with “Doctor, 36” are generalized to “Academic, 36.”

6.2 Formalized anonymization criteria

Example: In a set of data about patients – after the (unambiguously) identifying attributes (name, health insurance number) have been erased – all quasi-identifying and sensitive attributes are generalized one level up (the patient’s exact home address becomes the neighborhood, the age becomes a specified age range and the broken leg becomes a fracture).

6.1.3.2 Micro-aggregation Micro-aggregation describes a technique of de-identification in which the data are grouped according to similarity in the attribute values and the individual values are combined into a representative value for each group, such as the mean or median. While individual attribute values are altered (or generalized) with classic aggregation, with micro-aggregation the attribute values remain the same and are only summarized. Micro-aggregation therefore has the advantage over classic aggregation, among others, that it leads to less data loss and regularly maintains the granularity of the data to a higher degree. Example:

A distinction is made between de-identification techniques on the one hand and formalized anonymization criteria on the other. “Formalized anonymization criteria” are not techniques as such, but a mathematical description of the specific “security level” of the intended de-identification as a result of the planned (combination of) de-identification techniques to be used. The fulfillment of a formalized degree of de-identification is not synonymous with the achievement of a de facto anonymization; rather, the other necessary criteria (see 6.2.1 to 6.2.3) must also be observed.

6.2.1 Differential privacy Differential privacy is a mathematical definition of requirements to make the degree of de-identification measurable.26 Differential privacy aims to provide an accurate indication of the likelihood of re-identification without the need to identify individual data sets. How high the risk of re-identification is determined by the parameter epsilon (ε)27, expressed as the probability that a query via a database that contains an additional data set will produce the same result as a query on another database that does not contain this data set. The smaller the factor ε, the higher the protection against a re-identification attack. Which value ε must assume in order to achieve the degree of de facto anonymization according to this measurement method can only be assessed for each situation individually, since the quantity of the data plays a particularly significant role here.

A very simple form of aggregation is the summary of all data points to an average value. In principle, this no longer allows any conclusions to be drawn about individuals (for example the average salary of a software developer in a larger group). For example, patient data can be de-identified with the help of micro-aggregation by first dividing the patients into groups according to age and then replacing the individual age values within an age group with the age mean of this group.

A randomized function κ provides ∈ differential privacy if, for all sets of data D1 and D2 which differ in at most one entry, and all S ⊆ Range(κ), the following applies: Pr[κ(D1) ∈ S]≤e{ϵ}\times\Pr[κ(D2)∈S].

Papastefanou, “Database Reconstruction Theorem” und die Verletzung der Privatsphäre (Differential Privacy), CR 2020, 379-386 (382 et seq.).

Handbook | Law | Data Protection Law Anonymization of personal data

Technical requirements for effective de facto anonymization

Example:

At a conference, the number of all participants per subject area should be published. For this purpose, the data is aggregated and random noise is added to the result, which is selected according to the contribution of an individual user (for example, each participant can choose a maximum of three subject areas and is only counted once for each subject area). If the possible contribution of the users changes, the parameters of the noise must also be changed (for example, if a user is to choose only a single subject).

As part of a medical study, the zip code, attending physician and illness are saved and the sensitive information about the illness is to be de-identified. If there are two identical entries in the table with the same attributes for zip code, attending physician and illness, the k-value is 2.

The catchphrase “local differential privacy” is understood to mean the addition of statistical noise, which in turn means that the drawing of conclusions about individuals become impossible, but the data de-identified in this manner still permit a statistical evaluation. “Central differential privacy,” on the other hand, means that data is first aggregated and then provided with random noise in order to disguise the existence of individual data sets of users in the collected data. In both cases, the noise comes from a recognized distribution (usually Laplace or Gauss) with predetermined parameters, which are obtained from known properties of the existing data sets (for example, how often a single user has contributed a value and how much influence his data has on the result the aggregation).

6.2.2 k-anonymity k-anonymity is a formal data protection model that describes the statement about the probability of whether one set of data can be linked to another. This allows a statement to be made about the probability of re-identification.

Yet, k-anonymity has weaknesses. Due to the homogeneity of the equivalence classes (i.e. all k sets of data of an equivalence class have identical attributes) or due to additional background knowledge (i.e. an attacker knows about the existence of a person in a database and can attribute this person to the correct equivalence class, so that they can potentially exclude certain sensitive attributes for the person due to the additional knowledge), a re-identification is possible. These weaknesses are to be remedied through further developments of k-anonymity (through l-diversity and t-closeness, see below).

6.2.3 l-diversity and t-closeness l-diversity is an extension of k-anonymity in order to clean up weak points in the k-anonymity model that k does not make a statement about the representation of the person in the k-group. In the case of l-diversity, the association of a sensitive or otherwise easily identifiable attribute (cf. 6.1.1.) with a person is protected by hiding it in at least a set of l other sensitive attributes. An attacker therefore needs at least l-1 background knowledge in order to be able to deduce the correct attribute by sufficiently excluding incorrect sensitive attributes.

For de-identification, k-anonymity requires that sets of data are altered to such an extent that no conclusion can be drawn about a single person (i.e. indistinguishable from k-1 other persons). The “k-value” expresses the parameter of how often an attribute of a set of data occurs within a data collection (so-called equivalence class).

Handbook | Law | Data Protection Law Anonymization of personal data

Example: With a k-factor of 5, for example, two persons over 100 years of age are included and the age information is available in the data set. Due to the background knowledge that two persons over 100 years of age are present in a data set and since persons over 100 years of age are very rare, these two people can easily be re-identified. By further de-identifying this information, for example by attributing people over 100 years of age to a different age group, the set of data can be anonymized according to the l-diversity.

Technical requirements for effective de facto anonymization

6.3 Effectiveness of anonymization Depending on the nature of the raw data set, a combination of the formalized anonymization criteria or de-identification techniques listed above (see 5.3) may be necessary. The methods used for de-identification must ensure that, in terms of de facto anonymization (see 3.3.1 above), it cannot be expected that someone will be able to:

pick out a single person from the database (“singling out”) – this is the case as long as sets of data (such as a table) can be attributed to individual persons;

The t-closeness model, in turn, refines the approach of l-diversity by forming equivalence classes (also known as blocks) that are similar to the original distribution of the attribute values in the data. Another condition is introduced for this: Not only should at least 1 different value be represented in an equivalence class, but it is also necessary that each value is represented in a block as frequently as corresponds to the original distribution for each individual attribute.

Example: In an employee list, a wide variety of attributes are swapped or stochastically overlaid in the sets of data, but not the position of the employee in the company. Since there is only one head of the IT department in many companies, he can be easily identified from the database (if you know that the employee’s position has not been swapped), even if some of the attributes stored in the database are not accurate for the head of the IT department.

Example: An insurer wants to create a statistical overview of the districts in which most insurance claims are reported by customers and also break this down by age. To do this, however, it is not absolutely necessary to directly relate and process the zip code, age and the number of reported insurance claims of the customers concerned. If the t-closeness method is applied accordingly, the last digit of the zip code could be omitted and this could be divided into blocks, for example 8080*, 8033*, etc. The same can be done with the age of the customer, for example in five-year steps. The number of reported insurance claims would consequently only be displayed and processed in relation to these blocks (i.e. zip code areas and age areas).

establish a connection between two data sets of a database or between two independent databases in the sense of linkability – this means the possibility of attributing at least two different entries to the same person or the same group of people, regardless of whether they are in in the same database or not28;

Example: When de-identifying search queries via publicly accessible search engines, for instance, it cannot be ruled out that the data could be linked to a specific person with the help of other information available on the Internet.

In individual cases, the quality of the information provided by the anonymized data (e.g. statistics) can be contingent on various sets of data being further attributable to a single person without this person having to be known. In such a case, effective anonymization would have to ensure that re-identification is not possible despite the linking of several sets of data.

Handbook | Law | Data Protection Law Anonymization of personal data

derive information from a database by means of inference – according to this, it must in all likelihood be impossible to derive the value or content of a set of data from that of other entries.

Example: A census collects certain information about the population. The data is published but aggregated in such a way that at least five people were returned as a result for each query or otherwise no result was given. However, the selection of certain criteria for the inhabitants of a locality (gender, age group, nationality) results in exactly five people whose school education is identical. It can thus be deduced from the aggregated data that if you meet a person from this locality with the corresponding gender, age and nationality, you would implicitly also know their schooling.

If one or more of these “re-identification attacks” leads to “success,” i.e. this enables a – partial – re-identification, personal data still exists. Generally, no method and no criterion are sufficient in itself to effectively and de facto anonymize data. Sufficient de facto anonymization therefore regularly requires a combination of different methods of randomization and generalization (see also 5.3 and 6.4.3). Note: The effectiveness of the de facto anonymization must be assessed and documented for the respective situation at hand and the methods used for de-identification. The following questions can serve as a guide:

. . .

Who could have a motive for re-identification? What resources are available to someone for re-identification?

Technical requirements for effective de facto anonymization

What data is publicly available that could be used to restore the personal reference? This could be information from public registers (e.g. commercial register, land register, register of associations, etc.), information from social media, information that can be found via search engines or databases, or information that can otherwise be accessed via the Internet or from other data sources.

Are there third parties with access to the de-identified data who possess further information on basis of which the personal reference can be restored? (e.g. original data set, location data which can enable identification in combination with the de-identified data, etc.)

6.4 Selection of the anonymization method To determine the selection of the anonymization method (i.e. the required combination of de-identification techniques), the following aspects particularly must be taken into account:

6.4.1 What kind of data sets is involved? The first step is to evaluate the type and nature of the sets of data concerned:

What “sensitivity” (sensitivity describes not only special categories of personal data within the meaning of Article 9 GDPR, but also information that is particularly relevant for the data subject, such as bank and account information, see 5.2) does the data have?

. .

Would re-identification entail a high data protection law risk? How much data/persons and which (“type” of) persons (e.g. children) are affected?

What steps and what (time) effort are required in order to the de-identified data to be restored?

Handbook | Law | Data Protection Law Anonymization of personal data

Technical requirements for effective de facto anonymization

6.4.2 For which use case is the data being anonymized?

6.4.4 Review

In order to the data to be anonymized to have the quality required for the respective use case, the specific purpose of the anonymized data must also be taken into account when selecting the de-identification techniques to be applied. Too high a degree of anonymization could make the data unusable, whereas too low a degree of anonymization can rule out de facto anonymization.

6.4.3 Levels of anonymization The EU data protection authorities propose a step-bystep approach to anonymization, i.e., different techniques should be combined (see also 5.3).29 The following order can be used:

Removal of identifiers: First, the identifiers should be removed, i.e. all directly or indirectly identifying attributes should be erased.

Randomization: The next step is to randomize the data sets in order to remove a direct link between the data and the data subjects.

Generalization: In the last step, generalization and aggregation are used to reduce the accuracy of the data.

Depending on the specific use case and the risk proneness of the data sets to re-identification, however, a different order of de-identification techniques may be more expedient. In any case, the use of just a single de-identification technique will suffice in the rarest of cases in order to achieve a sufficient level of de-identification.

After each de-identification step, it must be checked whether sufficient de facto anonymization has already been achieved and whether a “re-identification attacker” would accordingly have to invest a disproportionately large amount of time, money and manpower in order to carry out a re-identification. See section 6.5 below for details.

6.5 Regular review of the anonymization method Due to technical progress and possible changes in other relevant objective factors (see 5.2), the anonymization method used (i.e., the sum of the de-identification methods used which lead to effective de facto anonymization) must be regularly reviewed and – if necessary – updated. Information security is not to be considered a condition, but rather a continuous improvement process (ISO/ IEC 27001) and should be regularly reviewed based on the PDCA cycle (Plan-Do-Check-Act)30 or by means of a comparable methodology. The PDCA system is also recommended for implemented or planned de facto anonymizations in order to regularly take stock of the security status. Since technical progress cannot realistically be forecast either in terms of de-identification techniques or future hardware performance (and the controller initially only checks whether re-identification is unlikely at the time the anonymization method is carried out in the context of de facto anonymization, see 5.2 and 6.2), the period between such reviews should be kept rather short in case of doubt.

Cf. also CNIL, available at https://www.cnil.fr/fr/ lanonymisation-de-donnees-personnelles.

Cf. for instance GDD-Praxishilfe DS-GVO II, Verantwortlichkeiten und Aufgaben nach der DSGVO, page 7, available at https://www.gdd.de/ downloads/praxishilfen/GDD-Praxishilfe_DS-GVO_2.pdf. The PDCA cycle describes a four-phase process that is used to control and continuously improve processes and products. Processes are initially planned (plan), tested (do), the test phase is evaluated (check) and, on this basis, the process is improved (act).

Handbook | Law | Data Protection Law Anonymization of personal data

Technical requirements for effective de facto anonymization

The technical guidelines of the German Federal Office for Information Security (BSI) on cryptographic procedures can also be used as a guide. These provide information about the reliability of forecasts for the security of cryptographic procedures and are therefore an indicator of the period over which a technology used (for anonymization) can be considered effective.31

or at least no longer alone without additional accompanying de-identification techniques, and must be replaced by new measures, or even that the entire anonymization method must be redesigned. In individual cases, it may also be necessary to again de-identify already de-identified data that were de-identified with the “old” anonymization method, using the new anonymization method in order to be able to continue to use these sets of data in a de facto anonymized form.

Note: Audits by external, specialized service providers can also be used to review the effectiveness of the anonymization method used. To date, though, the known service providers have not yet offered reviews of the effectiveness of the anonymization method used.

If the review and evaluation of the anonymization method leads to the result that it is no longer sufficiently effective and re-identification is possible, remedial action must be taken, as otherwise a risk for the rights and freedoms of the data subjects arises. This can possibly mean that old methods can no longer be used,

Even if anonymized data has been disclosed to third parties or made publicly accessible and later become re-identifiable again, it must be deleted or replaced by newly anonymized data in the event re-identifiability currently occurs. This should be taken into account before disclosing de facto anonymized data. In addition, an erasure concept – even if this is not legally required for anonymized data – can create additional legal security and risk minimization when dealing with de facto anonymized data. The parameters for an exchange or a return/erasure of this data can, for example, be contractually regulated.

Empfehlungen und Schlüssellängen, available at https://www.bsi.bund. de/DE/Publikationen/TechnischeRichtlinien/tr02102/index_htm.html.

Organizational implementation of the de facto anonymization by third parties

The controller can carry out an effective de facto anonymization of personal data itself or use a service provider. If the original data set cannot be deleted (for example due to statutory retention obligations or because justified processing of the original data set is to be carried out), the involvement of a third party may reduce the risk of re-identification. This is because if the third party deletes the original data set and other users (i.e., not the person holding the original data set) are only provided with the anonymized data set, a further security threshold has been created for the third party and these additional users with regard to possible re-identification. This procedure can also be a helpful organizational measure within a group of companies. In our opinion, however, the use of third parties is not mandatory, provided that internal structures (for example so-called Chinese walls) create organizational foundations that effectively exclude the consolidation of information that is available in a company.32

7.2 Third-party liability under data protection law

7.1 Organizational measures

In practice, it may occur that a processor would like to reserve the right to anonymize personal data for its own purposes (e.g. for internal analysis and statistical purposes). If it is assumed that the de facto anonymization constitutes data processing (see 3.2), the following problems arise with this approach:

In addition to the technical requirements for de facto anonymization, companies must also take accompanying organizational measures to ensure that re-identification is prevented. Such measures include authorization concepts that describe access rules for users or user groups, clear data governance structures (e.g. including the use of independent control bodies) that regulate the handling and access to data as well as anonymization through standards and guidelines, but also contracts and directives that prohibit and sanction the re-identification of anonymized data. With the help of contractual provisions in particular, a controller can agree on how to act towards third parties in the event of (possibly unintentional) re-identification. This does not make the anonymization as such more effective or more efficacious, but underlines and proves the honest efforts of the controller to achieve an effective anonymization. For example in the context of discretionary decisions of a data protection authority and its possible legal consequences, the authority may audit and evaluate these measures.

See already 3.4

If a third party (i.e., an external service provider or an affiliated group company) is called in for the de facto anonymization of the data, the distribution of roles under data protection law must be determined in more detail according to the general criteria.

7.2.1 Processor If the third party only de-identifies personal data on the instructions of the controller, so that the commissioning company alone decides on the means and purpose of data processing, the third party is acting as a processor. Therefore, a data processing agreement would have to be concluded with the service provider within the meaning of Article 28 GDPR and also the other statutory requirements must be complied with.

in this case, the processor does not act on the instructions of the client, there is a risk that it could be classified as a joint controller,

. .

the de facto anonymization may trigger information obligations (see 9.1) and if the anonymized data should become re-identifiable later (for example due to technical progress) and thus become personal, the data transmitted to the processor would have to be erased as well or replaced by “re-anonymized” data.

Therefore, this procedure and the corresponding consequences should be contractually excluded or clearly regulated between the parties (for example by agreeing on joint control for the activities in which the service provider does not act on behalf of the controller, if this represents joint control according to the GDPR, which also regulates the information and possible erasure obligations).

Handbook | Law | Data Protection Law Anonymization of personal data

Organizational implementation of the de facto anonymization by third parties

7.2.2 Joint control

7.3 Special features of de-identification within the group/a group of companies

The involvement of third parties can take place in such way that the parties involved act as joint controllers for the processing (cf. Article 26 GDPR). This is the case if the (original) controller jointly determines the purposes and means of data processing with the third party. Example: The third party performing the de facto anonymization has an economic interest in the processing of personal data and would like to use the de facto anonymized data together with the controller to provide a service.

In this constellation, the joint controllers must define the respective data protection roles and responsibilities in an agreement in accordance with Article 26 GDPR.

7.2.3 (Separate) controllers The third party can also act as a (separate) controller. This is conceivable in cases in which the third party (further) processes the data in other ways and without the possibility of influencing or other joint planning by the original controller. In view of the broad understanding of the courts and supervisory authorities regarding joint control within the meaning of Article 26 GDPR, such a constellation should only be possible in exceptional cases. In that case, namely, it must be ensured that the original and the new controller do not jointly decide on the purposes and means of data processing.

There is no “company group privilege” in the GDPR, which is why the above statements also apply to affiliated companies within the meaning of Section 15 of the German Stock Corporation Act (AktG). However, one particularity can result from the parent company’s right to issue instructions within the group. Admittedly, the right to issue instructions can be a measure to ensure anonymization, for example if the group code of conduct qualifies re-identification attempts of otherwise anonymized data as a business ethics incident groupwide with potentially drastic consequences. However, this alone is not enough for de facto anonymization. For example, if a subsidiary de-identifies the personal data of one or more group companies, the parent company frequently has the option of obtaining the original data set from its subsidiaries within the scope of its rights of instruction from a corporate law perspective (even if illegal instructions generally do not have to be complied with). In its decision on the classification of dynamic IP addresses, which was still handed down under the Data Protection Directive, the CJEU33 still saw these as personal data, provided that the processor (in this case a website operator) has the legal possibility to decrypt the personal reference. This is also comparable with the situation in corporate groups, especially where there are shared IT systems. Accompanying the technical implementation, sufficient data governance measures should therefore ensure (if applicable also with the help of company law means) that the use or processing of data that does not meet the data protection requirements is not possible, even within the same organizational unit.

CJEU, judgment of 19 October 2016 – C-582/14 – Breyer.

Legality of de-identification measures

The GDPR does not regulate whether de facto anonymization requires its own justification or whether it triggers further obligations under the GDPR. The BDI is of the opinion that de facto anonymization is privileged overall in the GDPR and therefore does not require a separate legal basis.34 However, if one follows the opinion of the BfDI35 and understands anonymization as processing within the meaning of Article 4 no. 2 GDPR, the GDPR is also to be applied in full to this processing. Against this background, this guide – as an aid – also deals with the requirements for de facto anonymization from a data protection point of view, in particular with regard to possible bases of justification under the GDPR (see 3.2).

Insofar as personal data are collected on the basis of consent, information about the de-identification measures must be provided with sufficient transparency in accordance with the transparency requirement in order to justify the de facto anonymization. If the transparency requirement is not met, the data subjects can be asked for further consent. Obtaining additional consent only for the purpose of de facto anonymization is hardly feasible in practice, unless the anonymization process was already covered by the original consent. The data subject will generally not have any interest in giving their consent again, and the practical implementation of the consent to be obtained would involve a high level of organizational and personnel effort. There are good reasons to argue that the implementation of de-identification measures can also be based on the other justification grounds, provided that the data subjects are informed of this.36

It is sometimes argued that consent with regard to the same processing activity has a kind of “blocking effect” that makes it inadmissible to rely on a different legal basis. Actualy though, anonymization represents an additional processing activity, for which a possible blocking effect based on consent granted does not apply. Since Article 6 (4) GDPR, which regulates the admissibility of the change of purpose, does not apply in these cases (“the processing for a purpose other than that for which the personal data have been collected is not based on the data subject’s consent […]”), the admissibility in this case is governed by letter b) to f) of Article 6 (1) GDPR or, in the case of special categories of personal data, by Article 9 (2) GDPR.

If personal data is also first collected for the purpose of de facto anonymization and the collection and processing is not based on consent, the application of one of the justification reasons from letters b) to f) of Article 6 (1) GDPR is required. It should be noted that de-identification measures that result in de facto anonymization regularly do not affect the rights and freedoms of data subjects or only affect them insignificantly, so that these will only outweigh the interests of the controller if other reasons or other processing occurs.37 However, this does not mean that personal data may be collected from all available sources without limitation, provided that the sole purpose is anonymization (see 8.2).

If personal data is anonymized as a security measure (cf. Article 32 GDPR) or due to a request for erasure (cf. Article 17 GDPR), this is regularly done on the basis of a legal obligation to which the controller is subject and is justified in accordance with letter c) of Article 6 (1) GDPR.

If personal data already collected for another purpose are to be de facto anonymized retroactively (and the initial collection was not based on the consent of the data subjects or on a legal provision of the Union or the member states within the meaning of Article 23 GDPR), the admissibility is governed by Article 6 (4) GDPR. See section 8.1 below for more details).

In the case of the de facto anonymization of special categories of personal data which are specially protected by law in accordance with Article 9 GDPR, the requirements of Article 9 (2) GDPR must also

In particular, innovations that arise through the use of data are also in the interest of the general public and therefore also in the interest of the persons affected by the de facto anonymization. In this respect, it could be argued that natural persons do not directly participate in the [benefit] of the evaluation of the data generated by their behavior that cannot be further attributed to them following the complete deidentification in the sense of a de facto anonymization. However, the previous data subjects do not bear the costs of the innovations, either, and will ultimately benefit directly and indirectly from new technologies that arise from the analysis of de facto anonymized data: as possible users of these new technologies and indirectly through a sustainable and successful German and European data economy. Voices in the literature partly discuss the possibility of a teleological reduction of the prohibition in Article 9 (1) GDPR, so that only the barriers of letter e) and letter f) of Article 6 (1) GDPR apply for anonymization of special categories of personal data (cf. Hornung/Wagner in ZD 2020, 223), although this approach has not yet been confirmed by either courts or supervisory authorities.

Handbook | Law | Data Protection Law Anonymization of personal data

be met,38 where the anonymization process of special categories of personal data can in principle also be justified by Article 6 (4) GDPR (see point 8.1)

8.1 Admissibility of de-identification measures in the event of a “change of purpose” As a rule, personal data should be de-identified that were collected for a purpose other than de facto anonymization. Article 6 (4) GDPR regulates the requirements for a change of purpose. What is critical is that the original collection and processing of the personal data was lawful and that the purpose of the first processing and the second processing are compatible. For this test, Article 6 (4) GDPR contains a non-exhaustive list of criteria for assessing the compatibility of the purpose of collection and further processing. These criteria also include “the existence of appropriate safeguards, which may include encryption or pseudonymization.” The purpose of de-identification measures to achieve de facto anonymization will usually be that personal data can be analyzed after the de facto anonymization without affecting the rights of the data subjects. The consequences of the de facto anonymization are therefore that subsequent further use of the data is “neutral” for the previous data subjects, since the personal reference has been de facto removed. Hence, de-identification measures that specifically serve to create a safeguard in accordance with letter e) of Article 6 (4) GDPR regularly constitute a permissible change in purpose. According to its wording, Article 6 (4) GDPR does not require that in addition to the original initial collection, a legal basis pursuant to sentence 1 of Article 6 (1) GDPR must exist for “further processing.” According

Voices in the literature partly discuss the possibility of a teleological reduction of the prohibition in Article 9 (1) GDPR, so that only the barriers of letter e) and letter f) of Article 6 (1) GDPR apply for anonymization of special categories of personal data (cf. Hornung/Wagner in ZD 2020, 223), although this approach has not yet been confirmed by either courts or supervisory authorities.

Legality of de-identification measures

to the likely prevailing opinion,39 this is not actually called for in the case of a change of purpose in addition to the requirements of Article 6 (4) GDPR. Rather, the further processing in the event of a permissible change in purpose is legitimized by the permission criterion on which the original data processing was based. In particular by Recital 50 of the GDPR speaks in favor of this, because according to this “no legal basis separated from that which allowed the collection of the personal data is required.” If one were to also consider a legal basis for the processing of personal data for the changed purpose to be necessary in addition to the requirements of Article 6 (4) GDPR for each change of purpose, the regulation of Article 6 (4) GDPR and the strict separation made therein between (i) further processing for incompatible purposes on the basis of other legal bases and (ii) further processing for compatible purposes would not have been needed, because processing for a purpose other than the original purpose could in any case always be based on a standard of permission under Article 6 (1) GDPR. A change of purpose according to Article 6 (4) GDPR is also possible for special categories of personal data. In the compatibility check under letter c) of Article 6 (4) GDPR,40 the need for protection of the data category concerned, in particular whether special categories of personal data in accordance with Article 9 GDPR are affected, must also be taken into account. Conversely, it follows that special categories of personal data may also be processed for other purposes in the context of a change in purpose. However, this requires a particularly careful examination of the connection with the original purpose and the reasonable expectations of the data subject as well as the necessary safeguards. This check should regularly be in the interest of the controller, since the anonymization of special categories

Cf., for example, the Federal Commissioner for Data Protection and Freedom of Information, Position paper on anonymization under the GDPR with special consideration of the telecommunications industry, valid as of: June 29, 2020, page 6 et seq.; Article 29 Working Party, WP 216: Opinion 5/2014 on Anonymization Techniques, page 8; Ziegenhorn/ von Heckel, NVwZ 2016, 1585 (1589); Taeger, in: Taeger/Gabel, GDPR/ BDSG, 3rd Ed. 2019, Article 6 margin no. 145 et seq.; Schulz, in: Gola, GDPR, 2nd Ed. 2018, Article 6 margin no. 210; Rossnagel, in: Simitis/ Hornung/Spiecker gen. Döhmann, Data Protection Law – GDPR with BDSG, 2019, Article 6 (4) margin no. 11; Monreal, ZD 2016, 507 (510); Kühling/Martini, EuZW 2016, 448 (451); Culik/Döpke, ZD 2017, 226 (330).

Cf. Heberlein, in: Ehmann/Selmayr/Heberlein, GDPR, 2nd Ed. 2018, Article 6 margin no. 58. 33

Handbook | Law | Data Protection Law Anonymization of personal data

Legality of de-identification measures

of personal data is particularly useful for the protection and interests of the data subject.

the interests of the data subjects, so that their data may no longer be stored and also may not be de-identified.

8.2 Obligations to review the legality of the (initial) collections

If data is not collected independently, but acquired, for example, as part of a company acquisition, the buyer becomes the new controller. If the buyer becomes aware that the personal data has been collected unlawfully, it is not allowed to process the data and it will usually have to erase it. However, if the buyer has no reason to assume that personal data was originally collected illegally, a de facto anonymization of the data and subsequent use of the de facto anonymized data is generally possible under the conditions set out in this guide.

Where personal data is not collected directly from the data subjects for the purpose of de facto anonymization, the controller cannot argue, even if per se no conflicting interests of the data subjects are apparent with regard to de facto anonymization (see Chapter 8), that personal data can be collected “freely” from all possible sources. On the one hand, the principle of transparency must be maintained and, even in the case of indirect data collection, the data subjects must be informed in accordance with Article 14 GDPR (see 9.1). On the other hand, the data source must also be taken into account when evaluating the admissibility of de-identification measures. If, for example, data is collected by so-called web crawlers (a computer program that automatically searches the World Wide Web and analyzes websites) in violation of the terms of use or statutory provisions, or if it is marketed by address dealers in a manner contrary to data protection legislation, the unlawful origin or collection cannot be “cured” by retroactive de facto anonymization. If the controller seeks to rely on legitimate interests in accordance with letter f) of Article 6 (1) GDPR, the aspects of data collection for the benefit of the data subjects are to be included in the balancing of interests, and will regularly lead to a predominance of

Share Deal Purchaser

Example: Personal data is transferred as an asset or with an acquired company share in the context of a corporate acquisition and there is no evidence for the buyer that the collection of the data was inadmissible. It will have to be demanded of the acquirer to conduct due diligence to verify whether data has been lawfully collected. Evidence of original legality should be documented. If the legality of the initial collection cannot be clearly documented, the buyer should at least take all other reasonable steps to check the legality. For example, the buyer can have the seller warrant the legality of the collection.

Company / company share Data set (contains personal data)

Asset Deal Purchaser

Further data protection requirements with regard to de facto anonymization If it is assumed that the application of de-identification measures constitutes data processing (see 3.2), the additional data protection requirements described below must be observed in particular (this also applies to special categories of personal data).

9.1 Information obligations according to Article 13, 14 GDPR In accordance with Article 13 or 14 GDPR, the data subjects must be informed about the collection of personal data and the implementation of de-identification measures for de facto anonymization. The information in accordance with Article 13 (1) and (2) GDPR (if data was collected directly from the data subject) must be provided at the time the data is collected, the information in accordance with Article 14 (1) and (2) GDPR (if data was collected from third parties) on the other hand within a reasonable period of time after the personal data have been obtained. If it is already clear at the time of the initial collection that the personal data should also be anonymized, the information must likely at least include anonymization as the processing purpose, the data (categories) concerned and, if applicable, the functionality of the de-identification as well as other circumstances relating to the de facto anonymization. If it is decided at a later point in time that the personal data already legally collected should be anonymized (cf. H.I.), the data subject must, in principle, be informed about this change in purpose in accordance with Article 13 (3), 14 (4) GDPR. At least in the case of the indirect collection of personal data, such information should regularly prove impossible or lead to disproportionate efforts in accordance with letter b) of Article

14 (5) GDPR; whether this is the case, however, must be checked and documented in detail. In the case of directly collected data, however, neither the GDPR nor the BDSG expressly provide for such a relief. From an evaluative standpoint, it could be argued that information is not required, at least in the case of retroactive de facto anonymization, since the rights and freedoms of the data subjects are not usually interfered with and therefore information does not have to be provided to maintain transparency. This is not provided by any of the statutory exceptions, and thus far there are no corresponding published official opinions. In any case, though, it is in line with interests not to make any disproportionate demands on the provision of information. Generic information about the fact that data is de facto anonymized, for example on the website of the controller, could be sufficient in individual cases, for example.

This should also apply, for example, to cases in which the data collected is only available to the controller in pseudonymized form and an identification of the data subject – potentially not possible or only possible with additional risks for the data subject – would have to be done solely for the purpose of notification. If the initial collection of personal data was based on consent, no information about a possible de facto anonymization was provided in the context of this consent and the personal data should then the de facto anonymized on the basis of another justification (e.g. letter f) of Article 6 (1) GDPR), the information must regularly be subjected to higher requirements so that the principle of transparency is adequately maintained. The data subjects can rely on the fact that their data will not be processed for purposes other than those communicated to them and that the controller will not “tacitly” change the legal basis.41

Cf. the Article 29 Data Protection Group guidelines recognized by the European Data Protection Committee with regard to consent in accordance with Regulation (EU) 2016/679, WP259 rev. 01, page 27 (there in the event of a revocation of consent).

Handbook | Law | Data Protection Law Anonymization of personal data

9.2 Documentation The process of de-identifying personal data leading to de facto anonymization should be documented for each use case. The GDPR itself does not contain an explicit obligation to provide documentation, but Article 5 GDPR in conjunction with Recitals 74 and 78 suggests that the controller must be able to provide evidence of which technical and organizational measures have been taken. 42 In particular, the documentation should be able to provide answers to the following questions:

. .

Which categories of personal data are subject to de-identification; On what legal basis is the “initial” collection of personal data carried out and how were the data subjects informed;

Cf. Kompetenzzentrum für Öffentliche IT, Anonymisierung: Schutzziele und Techniken, page 20.

Further data protection requirements with regard to de facto anonymization

What is the specific purpose for which the personal data are de facto anonymized, on what legal basis does this take place and what “robustness” of the data is required in order to be able to use de facto anonymized data for the intended purpose;

Which de-identification techniques are used and to what extent do these correspond to the current state of the art;

In what form has been and is continuously checked that the selected techniques lead to effective de facto anonymization.

Handbook | Law | Data Protection Law Anonymization of personal data

Further data protection requirements with regard to de facto anonymization

9.3 Data protection impact assessment

de-identification measures presented in this guide, provided that they concern “non-special” categories of personal data. However, some data protection authorities have resorted to the practice of drawing up a so-called “blacklist” (i.e., a list of required data protection impact assessments). In Germany, individual data protection authorities consider a data protection impact assessment to be necessary in the case of anonymization if special categories of personal data within the meaning of Article 9 (1) GDPR are anonymized for the purpose of disclosure to third parties.43

To date, the data protection authorities have only occasionally commented on the obligation to carry out a data protection impact assessment for anonymization. Whether a data protection impact assessment must be performed is generally based on the degree of probability of a high risk that the respective processing is likely to represent for natural persons. If such a high risk is likely to occur, a data protection impact assessment must be carried out. In principle, this cannot be the case with the

Cf. https://www.lda.bayern.de/media/dsfa_muss_liste_dsk_de.pdf.

Handbook | Law | Data Protection Law Anonymization of personal data

Imprint

Imprint Publisher BDI – Federation of German Industries Breite Straße 29 10178 Berlin T.: +49 30 2028-0 www.bdi.eu Originated from the BDI ad hoc working group “Anonymization of personal data” under the leadership of the chairman of the BDI working group for data management Carmen Schmidt, Volkswagen Group Info Service AG Dr Guido Brinkel, Microsoft Deutschland GmbH Editorial team Dr Bertram Burtscher, Partner Freshfields Bruckhaus Deringer LLP Dr Christoph Werkmeister, Principal Associate Freshfields Bruckhaus Deringer LLP Dr Michael Dose, Senior Manager Department Digitalization and Innovation Ines Nitsche, Senior Manager Department Law, Competition & Consumer Policy Conception Vicharah Ly, Senior Manager Department for Marketing, Online and Event Management Layout Michel Nunez, Art Director www.man-design.net Printing Das Druckteam www.druckteam-berlin.de Publishing Company Industrie-Förderung Gesellschaft mbH, Berlin Image credit Cover: © 158449702 | Michail | stock.adobe.com P. 7: © 136283864 | kiri | stock.adobe.com P. 23: © 115457793 | dvoinik | stock.adobe.com P. 37: © 188719057 | metelevan | stock.adobe.com Date and Number February 2021 BDI-Publications no. 0106 38

The BDI on Social Media Follow our latest articles on social media. We appreciate Likes, Retweets and comments.

Twitter

@Der_BDI

linkedin.com/company/ bdi-bundesverband-derdeutschen-industrie-e-v-/

Facebook

www.facebook.com/DerBDI

Newsletter

english.bdi.eu/media/newsletters

Anonymization of personal data

Articles inside

7. Organizational implementation of the de facto anonymization by third parties

6. Technical requirements for effective de facto anonymization

Legality of de-identification measures

4. Legal consequences of effective anonymization

Further data protection requirements with regard to de facto anonymization

Preface