Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in
  • tutor-server tutor-server
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 10
    • Issues 10
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • totemtotem
  • tutor-servertutor-server
  • Issues
  • #102
Closed
Open
Issue created Jun 16, 2022 by Maarten de Waard@maarten👼Developer

Modify custom filters of tracking log

A filter was created in #101 (closed) that allows us to modify the lines that end up in the tracking log.

We need to modify that filter so that the lines in the tracking log contain the least amount of identifying information we need. Currently a line looks like this:

2022-06-14 13:23:45,940 INFO 144 [tracking] [user None] [ip x.x.x.x] logger.py:41 - {"name": "/api/user/v1/account/login_session/", "context": {"user_id": null, "path": "/api/user/v1/account/login_session/", "course_id": "", "org_id": ""}, "username": "", "session": "28790d10905791760865f712b543bcaf", "ip": "x.x.x.x", "agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0", "host": "local.overhang.io", "referer": "http://local.overhang.io/login?next=%2Fcourses", "accept_language": "en-US,en;q=0.5", "event": "{\"GET\": {}, \"POST\": {\"email\": [\"admin@totem-project.org\"], \"password\": \"********\"}}", "time": "2022-06-14T13:23:45.939955+00:00", "event_type": "/api/user/v1/account/login_session/", "event_source": "server", "page": null}

As you can see, #101 (closed) already replaced the IP address with x.x.x.x. But we still have the following data:

  • user
    • In this example line the "user" is None, but that's because it's a tracking log of a failed login attempt (the first thing I had available when making this issue)
    • We want to replace usernames with something non-identifiable. Question: Does it even matter for us, or for Cairn, what username did something? Maybe we can replace all usernames with a bogus value (x) without it affecting the data we collect. Second best option is to pseudonimize/anonymize the username. We need to research the best ways to do that
  • name -- This is the URL that was tracked. This value can stay as it is
  • context
    • user_id-- Should be removed or anonymized, like user above
    • path -- no change
    • course_id -- no change
    • org_id -- no change
  • username -- should be anonymized/pseudonimized, same as user field
  • session -- Session ID can be linked to a browser cookie. Not sure if we want to keep that data: if we would replace all username/user IDs with just x and could still track individual sessions based on the session ID, that would be pretty neat. In that case we probably still want to hash the session ID so at least it can't directly be linked to a session cookie on somebody's PC.
  • ip -- preferably fully anonimyzed (like it is after #101 (closed)), but we could also keep a part of the IP if that has some kind of value.
  • agent -- I see no reason to keep this, unless we want to know what device people use to access a course.
  • host -- no change
  • referer -- no change
  • accept_language -- no change
  • event -- maybe we want to take the email address out of the log here.
Assignee
Assign to
Time tracking