De-identification
The process of removing or obscuring personal identifiers from data sets, making it difficult to identify individuals, used to protect privacy.
Definition
Techniques include pseudonymization (replacing identifiers with keys), k-anonymity (ensuring records share attribute values), generalization (broadening data granularity), and suppression (omitting sensitive fields). Effective de-identification balances privacy with data utility, and requires re-identification risk assessments under evolving re-identification techniques.
Real-World Example
A city publishes anonymized transit logs by replacing user IDs with random tokens and aggregating locations to 1-km grids. Periodic re-identification tests confirm that no individual’s trip can be traced, enabling open data use without compromising rider privacy.