Description
This thesis covers the topics Federated Learning and Differential Privacy and the
use of these technologies in an industrial context. To examine this, two Machine
Learning models - Linear Regression and Naive Bayes Classification - were designed
and implemented within a Federated Learning system. The Flower framework was used
for the federated infrastructure. The distributed training approach was tested both,
on a single machine and on four virtual machines connected via their IP addresses.
Differential Privacy was applied to the training datasets and to the prior probabilities
of the classifier. The implemented solution was tested with sample datasets, that
are designed for the implemented types of Machine Learning models, and with the
industrial data provided. The federated training process had only a minor impact on the
quality of the models. The impact of Differential Privacy used on the training datasets
was also not significant. Although the training data was noisy, still a useful Linear
Regression and a good Naive Bayes Calssifier could be trained. For the application of
Differential Privacy the IBM DiffPrivLib was chosen. The Laplace distribution was used
to add noise to the data. An appropriate parametrisation of Differential Privacy was
approximated using an approach to reconstruct a work piece from given feed rates of
the spindle. With this approach a suitable ratio of sensitivity to ! could be found, that
was then applied to the training datasets for the models. Further, noise was added to
the prior probabilities of the classifier, before they were sent to the server. To evaluate
the effect of the added noise, different values for the parameter ! were tested and the
effects on the resulting model were compared to oneanother. In the furture work about
this topic some sort of evaluation metric for the confidentiality of prior probabilities
should be found and taken in consideration for the added amount of noise.
This work demonstrated the feasibility of applying Federated Learning and Differential
Privacy to the industrial data collected on tool machines during production. The
results show, that the federated training approach does lead to a small loss in the model
quality. The use of Differential Privacy does have a similar influence on the trained
model. Nevertheless, the use of these technologies can ensure data confidentiality,
which is important in an industrial use case, because the used data contains sensitive
information as Intellectual Property.
|