TUM Logo

On the robustness of Active Learning Models against Data Poisoning

On the robustness of Active Learning Models against Data Poisoning

Supervisor(s): Nicolas Müller
Status: finished
Topic: Others
Author: Roman Canals
Submission: 2020-10-15
Type of Thesis: Bachelorthesis
Thesis topic in co-operation with the Fraunhofer Institute for Applied and Integrated Security AISEC, Garching


This bachelor thesis investigates vulnerabilities of machine learning models which use Active Learning.

In addition to universal security problems in the area of machine learning, Active Learning

as a concept opens new degrees of freedom for a potential attacker.


Consider an Active Learning environment where Crowd Sourcing is used as an Oracle. An attacker

could use malicious bots which label data erroneously on a large scale to disable the model. Next to

the Oracle, the Query Strategy is another potential point of attack. In a Pool-Based Sampling or Stream-Based

Selective Sampling scenario, an attacker could inject (unlabeled) malicious instances which appear to be

appealing but impair the model. Even if the model is not impaired, but only not improved, is this often a

success for the attacker. In the application area of Active Learning, data is often exposed to Concept drift.

By this natural drift of data, the model is then impaired little by little. These and other security problems are

presented and examined in various related research.


B. Miller et. Al. in [1] present an attack concept on a Pool-Based Active Learning environment which uses

Uncertainty Sampling as a Querying Strategy. However, it assumes the most simple data sets and neither considers

complex data distributions nor higher dimensional features.


In this thesis, a new attack concept on very similar framework conditions is presented which can be applied to a

significantly wider spectrum of classification problems (arbitrarily distributed data of arbitrary size with an arbitrary

number of features).