Review anonymization of data in "filter-tracking-log" plugin
We wrote a plugin that filters the tracking log data.
The goal of the plugin is to:
- Make sure no user identifiable information is stored in our tracking log. This includes:
- IP addresses
- user IDs
- usernames
- Pseudonimize session IDs in the tracking log, the idea is that:
- We need a session ID to distinguish between different sessions of people
- We assume that a session ID is sufficient information to distinguish different people. This has the result that people that often log out and in again on Totem can not be tracked as the same person, but we assume that our data set can still be meaningful with this limitation.
- We assume that by hashing the Session ID and removing the second half of the session ID, we have a pseudonimized session ID, that can not be tracked back to the session cookie on somebody's browser (at least not with certainty).
That last assumption is a big one, one that I would explicitly ask the auditor's opinion on. We can easily pseudonomize the session ID in a different way.
Here's a tracking log with the filter plugin turned off:
Here's one with the filter plugin turned on: