![](https://static.isu.pub/fe/default-story-images/news.jpg?width=720&quality=85%2C50)
12 minute read
Further data protection requirements with regard to de facto anonymization
Terms 03
In legal and technical literature, the terms relating to anonymization are used inconsistently, which increases legal uncertainty with regard to GDPR-compliant anonymization. The most important terms are defined below and hereinafter only used in this sense in the context of this guide.
3.1 Personal data
Personal data means any information relating to an identified or identifiable natural person (cf. Article 4 no. 1 GDPR). What is crucial for the identifiability is whether the information can be attributed to a natural person directly or indirectly, e.g. by reference to an identifier such as a name, number, location or other attributes.
If, after a certain identifier (e.g. a name) has been removed, data can be attributed to a natural person by reference to further identifiers (e.g. a job title and company, if this position only exists once in the company) or by consulting other additional information (e.g. an IP address and information on the identity of the user behind by the provider), this data continues to be personal.
Note:
In many cases, despite certain individual attributes, individual data cannot be clearly attributed to a data subject at first glance, but can be attributed to a certain group of people and a natural person can be identified due to the insufficient size of the group (for example, a re-identification can be carried out if a “female member of the executive board” of a group is mentioned and there are only one or two women on this executive board) or, when this data is combined with other (available) data, a specific person can be identified (e.g. for an employee: salary level in connection with starting date; for a patient: gender, zip code and a rare diagnosis). In these cases, there is still a personal reference due to the identifiability of a person.
3.2 Processing of personal data
Data protection law is linked to the “processing” of personal data. The term processing has a broad meaning and includes any operation that is “performed on” personal data (cf. Article 4 no. 2 clause 1 GDPR). In practice, there are hardly any conceivable operations in the handling of personal data that do not fall under this broad definition of processing. The (mere) collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction of personal data constitutes processing in each case (cf. Article 4 no. 2 clause 1 GDPR).
Different opinions are expressed on the question of whether the anonymization of personal data, i.e. the withdrawal of personal reference, also constitutes data processing. Since the anonymization process influences the personal reference of a set of data – as with the erasure or pseudonymization of data (cf. Article 4 nos. 2, 5 GDPR) – it is argued that the process should be treated as processing within the meaning of the GDPR.3 Against this, it is noted that anonymization precisely does not involve a data protection-relevant process, as the anonymization process is privileged by the GDPR in itself and should therefore not be subject to the requirements of the GDPR4. In the context of the consultation process of the Federal Commissioner for Data Protection and Freedom of Information (BfDI), the BDI very clearly concurred with this opposing view5. Nevertheless, due to the broadly worded concept of processing and in the absence of case law of the highest courts, no conclusively reliable statement can be made at the moment as to how anonymization is to be classified in relation to the legal definition of processing within the meaning of the GDPR.
3 The Federal Commissioner for Data Protection and Freedom of
Information, position paper on anonymization under the GDPR with special consideration of the telecommunications industry, valid as of: 29 June 2020, page 5. 4 The anonymization of the data is typically also in the interests of the data subject or at least does not run counter to these interests, Hornung/
Wagner, ZD 2020, 223 (224) with further references on the dispute. 5 BDI e.V., opinion on the BfDI consultation procedure “Anonymization of personal data” dated 23 March 2020, available at: https://www. bfdi.bund.de/DE/Infothek/Transparenz/Konsultationsverfahren/01_
Konsulation-Anonymisierung-TK/Stellungnahmen/BDI. pdf?_blob=publicationFile&v=1.
Following the opinion of the BfDI and treating anonymization as a form of processing within the meaning of Article 4 no. 2 GDPR6, all requirements of the GDPR must be complied with for the anonymization process as well. Against this background, this guide also contains explanations at various points of possible legal bases and secondary obligations associated with processing operations.
3.3 Diversity of the term “anonymization”
In practice, various designations for the term “anonymization” are used alternatively and sometimes synonymously, some of which are based on different degrees of de-identification of sets of data and are not always synonymous with GDPR-compliant, i.e. sufficient, de facto anonymization. In some cases, anonymization is used as a generic term for all reductions in personal reference. In the context of this guide, however, “anonymization” only includes de-identification measures that lead to a removal of the personal reference in the sense that the data protection regulations are no longer applicable. These are the methods of de facto anonymization described below. Absolute anonymization would also represent “GDPR-compliant” anonymization, but it is neither required by the GDPR nor can it be implemented in vast majority of cases.7
3.3.1 De facto anonymization
De facto anonymization (sometimes also referred to as relative anonymization) describes de-identification operations by which so many identifiers are removed and further techniques (see 6.1) to reduce personal reference (e.g. randomization or generalization) are applied that re-identification with reasonable efforts in accordance with the current state of the art (see 3.3.4) is no longer possible and the personal reference is eliminated (see Chapter 5 for more details). For the term “reasonable efforts” see 5.2.
3.3.2 Absolute anonymization
Absolute anonymization exists if the de-identification leads to the complete loss of any personal reference and a re-identification is absolutely excluded for everyone also from the overall context, with the greatest possible effort, i.e. using all even theoretically conceivable techniques regardless of the state of the art and the probability of re-identification, the costs and the required effort and the duration and by means of all possible additional information. In most cases, however, due to the diverse (digitally available) data sources, today’s information technology with its increasingly easier linkability of data and the continuous increase in the available computing power, the absolute irreversible loss of any personal reference appears impossible. Especially personal data that leave their traces in the digital world in the form of a digital footprint caused by use of the internet or even a mobile phone can hardly be removed completely. At least if purely theoretically conceivable re-identification techniques are considered personal data can almost never be anonymized in practice in such a way that a restoration of the personal reference can be absolutely excluded (see also 5.1).
Example:
Absolute anonymization can be assumed, for example, when data from a very large number of customers from all population groups and various geographical regions (e.g. in a global survey) are aggregated. If customers are asked about their satisfaction with a product, where the answer option is only “thumbs up” or “thumbs down,” there is a reliable number of answers for these two options and the individual answers are completely destroyed after the evaluation (i.e. including all copies and any interim results), the survey result (for example 58% of customers worldwide voted with thumbs up) would no longer be traceable to an identified person under any circumstances.
6 The Federal Commissioner for Data Protection and Freedom of
Information, position paper on anonymization under the GDPR with special consideration of the telecommunications industry, valid as of: 29 June 2020, page 5. 7 See in detail regarding the legal requirements for anonymization in section 5.1
3.3.3 Formal anonymization
The weakest form of de-identification is the removal or omission of directly identifying attributes from the set of data, such as the real name or a personalized email address. The other attributes (for example so-called quasi-identifying attributes such as a user ID, ID number, VIN, etc.) are in contrast retained. This technique is also known as formal anonymization and regularly does not lead to GDPR-compliant de facto anonymization. Rather, formal anonymization is often just a form of pseudonymization, since reference to the removed identifying feature still allows it to be attributed to a natural person. Formal anonymization is therefore generally considered inadequate and is limited, for example in the German Federal Statistics Act (BStatG)8, to certain groups of recipients. Section 16 para. 6 no. 2 BStatG only grants access to formally anonymized individual information within specially secured areas of the Federal Statistical Office and the statistical offices of the states (Länder) if effective precautions are taken to maintain confidentiality. Authorized persons can only be public officials who are particularly sworn to public service. In order to at least make the re-identification on basis of so-called quasi-identifying attributes and the detection of identifying features more difficult, further measures are necessary.9
Example:
As part of a customer satisfaction survey, the answers are tabulated with names. Only the column with the name is then deleted from the table. In these cases, it is possible that the information provided by the customers (e.g. information on name, address, etc. in a free text field) and further information in the table (e.g. customer number or product number) in combination with further information (e.g. overview of transactions with date and product number; the delivery
8 Law on Statistics for Federal Purposes (Federal Statistics Act – BStatG) in the version pronounced on 20 October 2016 (Federal Law Gazette
I page 2394). 9 Jan Dennis Gumz, Mike Weber, Christian Welzel (Kompetenzzentrum für Öffentliche IT [Competence Center for Public IT]), Anonymisierung:
Schutzziele und Techniken [Anonymization: protection goals and techniques], page 10, available at https://cdn0.scrvt.com/ fokus/784daae14fc72f91/ bcebf7142066/Anonymisierung---Schutzzieleund-Techniken.pdf. address is a single-family house in which only one person lives) will permit the attribution of individual details to specific customers.
3.3.4 State of the art
The state of the art within the meaning of the GDPR includes the recognized rules of technology that have already spread and been proven in practice.
The term can be found not only in the GDPR, but also in national regulations such as Section 109 para. 1 of the German Telecommunications Act (TKG), Section 13 para. 7 of the German Telemedia Act (TMG), Section 8 a para. 1 sentence 2 of the German Act to Strengthen the Security of Federal Information Technology (BSIG). To determine the state of the art, recourse is taken to state-of-the-art approaches or the related best available techniques. The term “state of the art” is to be understood as dynamization within the GDPR, since the state of the art is constantly evolving. The obligation to take into account the state of the art is accordingly to be construed as an upper limit; the latest scientific trends and prototypes do not have to be taken into account as soon as they are discussed in the specialist community;, it rather depends on whether techniques become established in practice. This means that the controller or the processor cannot assume that the state of the art will remain static. It is therefore necessary to regularly review the anonymization method (see 6.4). Standards can also be utilized for a review in order to meet the own due diligence obligations by resorting to measures, techniques and processes that have already been assessed as appropriate by expert panels.10 The following standards related to anonymization are currently being developed or have already been published:11
10 DIN Standards Committee for Information Technology and Applications, opinion on the BfDI consultation procedure for anonymization under the
GDPR with special consideration of the telecommunications industry, available at https://www.bfdi.bund.de/DE/Infothek/Transparenz/
Konsultationsverfahren/01_Konsultation-Anonymisierung-TK/
Positionspapier-Anonymisierung-GDPR-TKG.html;jsessionid=F53DF
E83354223C285E7093AF6EC59B4.2_cid507?nn=5216976. 11 DIN Standards Committee for Information Technology and Applications, opinion on the BfDI consultation procedure for anonymization under the
GDPR with special consideration of the telecommunications industry, available at https://www.bfdi.bund.de/DE/Infothek/Transparenz/
Konsultationsverfahren/01_Konsultation-Anonymisierung-TK/
Stellungnahmen/DIN.pdf;jsessionid=B5D45D1FD485BC3389BEFD8 56DE7F466.2_cid329?_blob=publicationFile&v=1.
.ISO/IEC 20889:2018 “Privacy enhancing data de-identification terminology and classification of techniques”;
.ISO/IEC 27555 “Guidelines on personally identifiable information deletion” (expected publication:
December 2021).
3.4 De-identification
De-identification basically describes the process at the end of which there is the reduction or ultimately even the removal of the personal reference of previously personal data. The result of a de-identification can also be a “mere” pseudonymization, i.e. if a re-establishment of the personal reference is still possible (albeit with a certain amount of effort). The more far-reaching the “de-identification” is the more likely it is that in the end a GDPR-compliant de facto anonymization can be assumed. Whenever methods or techniques of anonymization are mentioned in this guide, methods or techniques of de-identification are meant.
3.5 Re-identification
In contrast to de-identification, re-identification stands for the restoration of the personal reference, i.e. the reversal of the process that led to the reduction or removal of the identifier. A re-identification is not possible with absolute anonymization, and with de facto anonymization only with a disproportionate amount of effort or after further development of the state of the art.
3.6 Pseudonymization and delimitation from de facto anonymization
Pseudonymization refers to a process of de-identification, as a result of which data can only be attributed to a specific data subject with the use of additional available information (the original data set or a “key”) (cf. Article 4 no. 5 GDPR). The personal reference is therefore not irreversibly removed; rather, the data is only (temporarily) “decoupled” from the associated data subjects and the personal reference is limited, but the natural persons behind the information can still be identified (via the key).
Example:
In a document, real names are replaced by numbers. There is a second separate document that contains a list of which number represents which name.
Due to the possibility of re-identifying the data without great effort (e.g. by obtaining a re-identification key), pseudonymization is not a form of de facto anonymization. Rather, pseudonymization is a technical and organizational measure to protect personal data that prevents or at least restricts the unhindered attribution of the sets of data to data subjects, in particular by third parties (for example, by only granting a certain selected number of people access to a re-identification key) (cf. Article 32 para. 1 lit. a GDPR). Pseudonymization therefore includes de-identification measures that remove the direct personal reference in the set of data, but at the same time keep the possibility of subsequent attribution (for example, use of initials instead of names, etc.). Through the additional use of further anonymization techniques (see Chapter 6), but also additional organizational measures (e.g. restriction of access to the de-identification methods), pseudonymized data can be de facto anonymized. Whether a personal, anonymous or pseudonymous date is involved, depends greatly on the circumstances and framework conditions in the individual case. A date can therefore be classified differently by different data controllers.
Example:
A set of data that contains the IP addresses of device users is personal for the telecommunications service provider, as he can make the attribution. For other persons, however, the same data set is (at least) pseudonymized, since it cannot be attributed to specific persons without additional information.