TUM Logo

Privacy-Preserving Distributed Learning Techniques

Privacy-Preserving Distributed Learning Techniques

Supervisor(s): Immanuel Kunz
Status: finished
Topic: Others
Author: Katharina Emde
Submission: 2020-09-15
Type of Thesis: Masterthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching

Description

In the last few decades the number of machine learning applications has been steadily
increasing. Although their structures and specific purpose vary throughout a wide range,
they all have in common that they require a massive amount of input data to learn from.
This development along with commercially available ”machine learning as a service” providers
such as Google or Amazon require a rise of awareness of data privacy in the context
of machine learning.
On the one hand, distributed learning techniques, such as federated learning, promise
to improve privacy by keeping training data at the sources rather than uploading it to a
central (potentially untrusted) service. On the other hand, existing works have already
shown different ways of attacking the privacy of ML models, e.g. regarding membership
or feature inference. Privacy, however, is not a precise concept, and various goals and
metrics have evolved to quantify it in different domains.
This thesis addresses the issue of handling privacy within machine learning by formulating
a guideline that serves as support during the selection process of fitting privacy
metrics for a machine learning application. This is achieved by first examining the applicability
of various known privacy metrics like k-anonymity, differential privacy or adversary’s
success probability in the context of machine learning. The recommendations for the use of
metrics depending on properties of the respective machine learning application are formulated
subsequently.
In order to evaluate the applicability of the guideline in combination with a distributed
learning setup, a neural network classifying the items of the UCI adults dataset is implemented
and examined according to the guideline.