Data anonymization is an important tool for organizations to protect the personal data of individuals, while averting the onerous requirements of the EU and U.K. General Data Protection Regulations. Unfortunately, guidance on this subject is often unclear, with standards for anonymization differing among jurisdictions. This article provides privacy practitioners with a concise guide to understanding these divergent approaches. It further discusses ways in which the European Data Protection Board, due to adopt anonymization guidelines as part of its 2021/2022 work programme, could unify them in a manner that protects the interests of businesses and consumers alike.
Why anonymize?
As business leaders and privacy practitioners alike can attest, complying with the EU GDPR is time consuming and expensive. Further, failure to do so can result in multimillion-dollar fines. One option for minimizing the risk of noncompliance is anonymization. For information to constitute personal data, it must be identified or identifiable under GDPR Article 4(a) . If data is not identified or identifiable, then it is not personal data — it is anonymized — and accordingly, as stated in Recital 26, falls outside the scope of the GDPR.
How to anonymize?
Data is anonymized when it no longer relates to an identifiable natural person and is not reasonably susceptible to reidentification. This requires more than replacing names and identifiers in a data set with placeholders. Doing so merely pseudonymizes data, which is a useful security measure, but falls short of complete anonymization. This is because, as the Article 29 Working Party observed in Opinion 05/2014, data without conspicuous identifiers still presents several forms of reidentification risk, including:
- Singling out: Where a unique identifier can be used to identify an individual, e.g., if a data set indicates an individual is three meters tall.
- Linkability:Â Where information in one data set can be linked with information in another data set in a manner that poses a risk of reidentification, if human resources records can be attributed to individuals using information available on LinkedIn.
- Inference: Where a link can be inferred between two fields in a data set, e.g., employee seniority and salary, in a manner that poses a risk of reidentification. Â
According to the Working Party, two strategies to combat these risks are generalization, modifying the scale of data by grouping individuals to make reidentification more difficult, and randomization, randomly altering the content of data within a set range to make reidentification more difficult. Although this provides a general framework for anonymization, the exact steps necessary to anonymize data vary by jurisdiction.
Anonymization standards by jurisdiction
The EU and the Working Party
In analyzing the GDPR’s predecessor, the 1995 Data Protection Directive, the Working Party adopted anonymization guidance in Opinion 04/2007. That opinion interpreted Recital 26 of the directive, which provided the anonymization test should consider whether reidentification is feasible, taking into account “all the means likely reasonably to be used either by the controller or by any other person to identify the said person.� The Working Party found this reflects a practical standard that inquires as to the likelihood of reidentification, given the data at issue and contextual considerations, such as the cost of reidentification and the potential harm to data subjects. “A mere hypothetical possibility� of singling out an individual does not preclude anonymization.
In its subsequent Opinion 05/2014, also regarding the directive, the Working Party largely echoed this analysis. However, it introduced an odd quirk, noting “when a data controller does not delete the original (identifiable) data at event-level, and the data controller hands over part of this dataset (for example after removal or masking of identifiable data), the resulting dataset is still personal data.� In other words, the Working Party concluded a data set cannot be anonymized if the controller retains the information from which it is derived, even if the latter information is privately and securely stored.
This poses obvious practical challenges. When businesses anonymize data, they are often not in a position to delete the source data from which it is derived. For example, a real estate company or credit bureau may wish to prepare a generalized report on a broad tranche of consumers but would not be able to delete the underlying records underlying without severe commercial repercussions.
The Working Party’s data-deletion rule also conflicts with the basic principles of the GDPR, which imposes no such requirement. Rather, the GDPR aims to regulate data that could reasonably be associated with a data subject. Such a risk of association may exist where individuals in a data set could be reidentified with publicly accessible information. In contrast, a risk of association may not exist if those individuals could only be reidentified with encrypted information in private storage. Ultimately, as reflected in Recital 26 of the GDPR, anonymization should inquire as to the means reasonably likely to be used to identify a person rather than imposing a categorical requirement to delete all information on which an anonymized data set is based.
Germany, the CJEU and the Breyer Judgment
A 2016 ruling by the Court of Justice of the European Union introduces a related question: should the potential for reidentification be evaluated from the perspective of the party holding the data, i.e., a relative approach, or from the perspective of third parties, i.e., an absolutist approach? In Breyer, a case referred to the CJEU by the German Federal Court of Justice, the CJEU concluded dynamic IP addresses could constitute personal data, even though the information necessary to reidentify them was held by a third party, an internet service provider in this case, and not immediately accessible to the controller, the German government. The court’s rationale tracked Opinion 05/2014 and found data was not anonymized because it could be reidentified using other information, even though the information was privately held. This suggests an absolutist approach to evaluating reidentification risk, which some supervisory authorities have followed. For example, France’s data protection authority, the Commission nationale de l’informatique et des libertés, noted anonymization requires making “identification practically impossible.â€�Â
Nevertheless, the Breyer judgment has some significant limits. In particular, the CJEU noted the German government had legal channels available to require the internet service provider to produce information linking individuals to their IP addresses. As such, in Mircom International v. Virgin Media, the England and Wales High Court noted Breyer’s holding depended “on specific factual aspects of the German legal system.â€� It further concluded the mere fact that a party is able to obtain information necessary for reidentification did not make reidentification reasonably likely to occur.Â
Moreover, there are obvious practical limits to the third parties that the anonymization inquiry should consider. Third parties should only be relevant to the extent they are reasonably likely to seek to deidentify data or assist another party in doing so. In contrast, a third party does not pose a threat to a data subject where it has no incentive to deidentify information or intention of doing so, even though it may possess the requisite means. This is reflected in guidance from the Irish Data Protection Commission, which instructs practitioners to consider probable intruders and their motives. The U.K. Information Commissioner’s Office has adopted a similar approach, as discussed in the following section.
The United Kingdom
The ICO recently issued draft anonymization guidance, which was open for public comment until Dec. 31, 2022. The guidance adopts a practical approach, instructing those seeking to anonymize data to “take into account all the means reasonably likely to be used, by yourself or a third party, to identify an individual that the information relates to.� In doing so, parties should consider factors such as the cost and time required to reidentify data, the sensitivity of the data and the audience to which the data will be released. For example, data distributed to a limited number of parties under nondisclosure agreements poses less of a reidentification risk than data published on a public website. This inquiry is made from the perspective of a “motivated intruder� — a person with a motive to deidentify the data that is neither “a relatively inexpert member of the public� nor a computer science expert. The ICO also rejected the Working Party’s data-deletion rule, finding data may be anonymized if it is held by an organization that does not possess a key to reidentify it. In contrast, information remains personal data if it is held by a party with the means to reidentify it.
Practical advice for anonymization
The anonymization approaches taken by the Working Party, the German courts, the English courts and different supervisory authorities point to a few key areas of disagreement, including: whether underlying data must be deleted for a data set to be anonymized, whether the anonymization standard is relative or absolute and whether reidentification must be practically impossible, or merely reasonably unlikely. In light of these gray areas, practitioners should consider the following:
- Delete all information underlying an anonymized data set where feasible. If deletion is not practical, the data controller should document why it cannot delete the underlying data set, employ robust anonymization measures, including randomizing and generalizing data, and limit disclosure to the extent feasible, i.e., to a limited number of entities under nondisclosure agreements that bar further disclosure of the data.
- Consider information a third party could use to reidentify individuals in an anonymized data set. It is unlikely supervisory authorities will adopt a purely absolutist approach to anonymization (i.e., asking whether any party in the world could conceivably deidentify data) but, as in Breyer, they will likely consider all information sources a party seeking to deidentify data could reasonably access.Â
- Recall, even where the Working Party’s data-deletion rule does not apply, data will not be considered anonymized as to your organization if your organization possesses the means to reidentify it.
- Take context into account when deciding on the anonymization measures to employ. Sensitive data with the potential to harm individuals if misused will require more thorough anonymization measures than benign data sets.
- Deidentify information as thoroughly as possible. Although most jurisdictions will likely adopt a practical standard, asking whether reidentification is reasonably likely, some, such as the CNIL, may apply a more stringent test.
Finally, practitioners should monitor the EDPB, as it is likely to issue guidelines that apply to the anonymization of data across the EU. Hopefully, the guidance will reject the Working Party’s data-deletion rule, evaluate anonymization according to contextual considerations and adopt a practical standard that asks whether reidentification is reasonably likely (as opposed to impossible). This would strike the best balance of providing reasonable protection to data subjects, without unduly constraining the organizations that handle their data. In the meantime, practitioners will have to do what they can to straddle the divergent anonymization approaches applied across the U.K. and EU.
 Â