
1 minute read
Step 4: Compute Your Risk
STEP 4 COMPUTE YOUR RISK
Applicable to: Internal data sharing (de-identified data)
Advertisement
Long-term data retention Internal data sharing (anonymised data) or External data sharing
Synthetic data
k-anonymity9 is an easy method10,11 to compute the re-identification risk level of a dataset. It basically refers to the smallest number of identical records that can be grouped together in a dataset. The smallest group is usually taken to represent the worst-case scenario in assessing the overall re-identification risk of the dataset. A k-anonymity value of 1 means that the record is unique. Generally, only indirect identifiers are considered for k-anonymity computation.12 A higher k-anonymity value means there is a lower risk of re-identification while a lower k-anonymity value implies a higher risk. Generally the industry threshold for k-anonymity value is at 3 or 5. 13 Where possible, a higher k-anonymity threshold value should be set to minimise any re-identification risks. Refer to Chapter 3 (Anonymisation) of PDPC’s Advisory Guidelines on the Personal Data Protection Act for Selected Topics on the criteria for determining whether the data may be considered sufficiently anonymised.
The above diagram illustrates a dataset with three groups of identical records. The k value of each group ranges from 2 to 4. Overall, the dataset’s k-anonymity value is 2, reflecting the lowest value (highest risk) within the entire dataset.14