1 minute read

Step 4: Compute Your Risk

STEP 4 COMPUTE YOUR RISK

Applicable to: Internal data sharing (de-identified data)

Advertisement

Long-term data retention Internal data sharing (anonymised data) or External data sharing

Synthetic data

k-anonymity9 is an easy method10,11 to compute the re-identification risk level of a dataset. It basically refers to the smallest number of identical records that can be grouped together in a dataset. The smallest group is usually taken to represent the worst-case scenario in assessing the overall re-identification risk of the dataset. A k-anonymity value of 1 means that the record is unique. Generally, only indirect identifiers are considered for k-anonymity computation.12 A higher k-anonymity value means there is a lower risk of re-identification while a lower k-anonymity value implies a higher risk. Generally the industry threshold for k-anonymity value is at 3 or 5. 13 Where possible, a higher k-anonymity threshold value should be set to minimise any re-identification risks. Refer to Chapter 3 (Anonymisation) of PDPC’s Advisory Guidelines on the Personal Data Protection Act for Selected Topics on the criteria for determining whether the data may be considered sufficiently anonymised.

The above diagram illustrates a dataset with three groups of identical records. The k value of each group ranges from 2 to 4. Overall, the dataset’s k-anonymity value is 2, reflecting the lowest value (highest risk) within the entire dataset.14

Postal code

22xxxx 22xxxx 10xxxx 10xxxx 10xxxx 10xxxx 58xxxx 58xxxx 58xxxx Age

21 to 25 21 to 25 41 to 45 41 to 45 41 to 45 41 to 45 56 to 60 56 to 60 56 to 60 Favourite show

Emily in Paris Emily in Paris Brooklyn Nine-Nine Brooklyn Nine-Nine Brooklyn Nine-Nine Brooklyn Nine-Nine Attenborough’s Life in Colour Attenborough’s Life in Colour Attenborough’s Life in Colour k=2

k=4 Overall k=2

k=3

This article is from: