On the robustness of Active Learning Models against Data Poisoning

Supervisor(s):	Nicolas Müller
Status:	finished
Topic:	Others
Author:	Roman Canals
Submission:	2020-10-15
Type of Thesis:	Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching
Description This bachelor thesis investigates vulnerabilities of machine learning models which use Active Learning. In addition to universal security problems in the area of machine learning, Active Learning as a concept opens new degrees of freedom for a potential attacker. Consider an Active Learning environment where Crowd Sourcing is used as an Oracle. An attacker could use malicious bots which label data erroneously on a large scale to disable the model. Next to the Oracle, the Query Strategy is another potential point of attack. In a Pool-Based Sampling or Stream-Based Selective Sampling scenario, an attacker could inject (unlabeled) malicious instances which appear to be appealing but impair the model. Even if the model is not impaired, but only not improved, is this often a success for the attacker. In the application area of Active Learning, data is often exposed to Concept drift. By this natural drift of data, the model is then impaired little by little. These and other security problems are presented and examined in various related research. B. Miller et. Al. in [1] present an attack concept on a Pool-Based Active Learning environment which uses Uncertainty Sampling as a Querying Strategy. However, it assumes the most simple data sets and neither considers complex data distributions nor higher dimensional features. In this thesis, a new attack concept on very similar framework conditions is presented which can be applied to a significantly wider spectrum of classification problems (arbitrarily distributed data of arbitrary size with an arbitrary number of features).

On the robustness of Active Learning Models against Data Poisoning

Description