Anonymization of personal data

Page 20

06

Technical requirements for effective de facto anonymization

The effective implementation of de facto anonymization (i.e. the fulfillment of certain formalized anonymity criteria) depends on the anonymization technique(s) used. The GDPR itself does not specify which anonymization techniques should be used. This section presents some of the common de-identification techniques and recommendations for checking their effectiveness.

6.1 Overview of de-identification techniques Many de-identification techniques exist that can be used to de-identify personal data. These meet – depending on the methodological approach and potential “re-identification attack model” – certain formalized anonymity criteria (for example: k-anonymity, l-diversity, t-closeness, differential privacy). Which de-identification technique or combination of these can guarantee sufficient de facto anonymization must always be assessed in light of the specific individual case at hand.

6.1.1 Removal of identifiers Personal data can consist of identifying attributes (i.e. name or identity card number), a quasi-identifying attribute (i.e. date of birth, place of residence or gender) as well as sensitive attributes (e.g. illnesses, sexual tendencies, very old age, etc.). In this context, the term “sensitive attribute” is not to be equated with special categories within the meaning of Article 9 (1) GDPR. One speaks of a sensitive attribute if the disclosure of the content and the attribution to a person justify a particular risk of potential or invasions of privacy (this also includes, for example, bank details, social security number or photographs).23 By removing the identifying and quasi-identifying attributes, data can be de-identified. In this case, individual or several identifying or quasi-identifying attributes (i.e. identifiers) are

completely deleted from a set of data, so that conclusions about an individual person are no longer possible, or at least this becomes very difficult. Yet, removing these identifiers is usually only the first step towards de facto anonymization. Example: The name of the user, the user and vehicle number are deleted from GPS location data generated by vehicles. In this way, the GPS location data can only be traced back to a single person under difficult conditions (and possibly only with corresponding additional knowledge).

6.1.2 Randomization Randomization/perturbation (i.e. a type of “disturbance”) refers to techniques (see a selection of individual such techniques below under 6.1.2.1 to 6.1.2.6) with which data values are replaced by artificially generated values in order to “alter” or “perturb” a data set in such a way that the direct link between certain data and the data subjects is removed. The data should only be altered to such an extent that at least statistical properties of the data set are retained for analysis purposes.

6.1.2.1 Data swapping In swapping, certain attributes of a data subject are artificially swapped for attributes of another person. Ideally, this happens randomly or pseudo-randomly,24 where it must be ensured that no data set ultimately reproduced itself. The technique can be improved if the variables of a specific person do not exactly match the variables of the other person.

24 23

20

In general on these concepts, Dietmar Hauf, page 8, available at: https:// dbis.ipd.kit.edu/img/content/SS07Hauf_kAnonym.pdf.

Pseudorandomness is a calculated randomness. This looks like a “real” randomness to the observer but can be reversed with knowledge of the key material.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.