Semi-Supervised Opinion Spam Detection
Semi-Supervised Opinion Spam Detection
Supervisor(s): | Bojan Kolosnjaji |
Status: | finished |
Topic: | Machine Learning Methods |
Author: | Muhammad Bilal Javed |
Submission: | 2016-06-15 |
Type of Thesis: | Masterthesis |
Proof of Concept | No |
Astract:Opinion spam detection is a new and exciting area of research with a strong emphasis on statistical spam detection techniques. With the dawn of social networking, people are now sharing a lot about themselves and their experiences on the social networking platforms, which leaves an open room for spammers to distort public opinion and choices. Therefore, the problem of opinion spam is an ever growing concern for web- sites, businesses, and customers alike.In this work, we propose a novel way of detecting opinion spam in hotel review data using a fully semi-supervised approach which to the best of our knowledge has never been tried before. We have taken inspiration from using unlabeled data to boost performance for document classification and have successfully applied it to our problem. We developed three algorithms and evaluated their performance on labeled test data.Using our approach we were able to achieve an overall accuracy of 69.2% on labeled test data without the use of reviewer -based behavioral features, which is an improve- ment on the previous bench mark. By using reviewer-based behavioral features for the labeled training data alone, we were able achieve an overall accuracy of 84.6% on labeled test data which is almost as good as the previous best results. |