How to anonymize (and pseudonymize) clinical data

Anonymization and pseudonymization are not the same. Learn when to use each and how to implement them correctly.

1
Distinguish anonymization from pseudonymization
Anonymization = impossible to re-identify. Pseudonymization = direct identifiers replaced but re-identification possible via separate key.
2
Identify all identifying variables
Direct: name, SSN, medical record number. Indirect (quasi-identifiers): birth date, zip code, occupation.
3
Define your pseudonymization scheme
Typical: subject ID like 'US01-001' (site + order). Key kept separate from the analysis dataset.
4
Generalize quasi-identifiers
Birth date → year. Zip code → state. Age → 5-year bracket. Evaluate k-anonymity.
5
Store the key separately
Re-identification key NEVER in the same repository as clinical data. Ideally a different role (DPO/Privacy Officer).
6
Document the process
Your institution's DPO must know and approve the scheme. Documented in RoPA and DPIA if applicable.

Ready to apply it to your study?

Distinguish anonymization from pseudonymization