Guardians of the Data: Searchable encryption for document management systems
Guardians of the Data: Searchable encryption for document management systems
Supervisor(s): | Fabian Franzen |
Status: | finished |
Topic: | Others |
Author: | Luka Tomas |
Submission: | 2025-03-24 |
Type of Thesis: | Bachelorthesis |
DescriptionAs digitization accelerates, efficiently managing and securely storing digital documents has become increasingly critical. Document Management Systems (DMS) streamline organizational workflows by providing indexing, metadata extraction, and full-text search capabilities. However, most existing DMS solutions, such as the popular open-source project Paperless-NGX, either neglect encryption entirely or rely on server-side encryption, which undermines data confidentiality by storing encryption keys alongside encrypted documents. This fundamental limitation necessitates a robust solution capable of ensuring both confidentiality and search functionality simultaneously. We address this challenge by introducing Miniwhoosh, a search library implementation with support for Searchable Symmetric Encryption (SSE) based on an inverted index. We detail the design, implementation, and integration process of Miniwhoosh into the Paperless- NGX project, emphasizing the minimal disruption to the existing architecture. Furthermore, we provide a comprehensive performance analysis comparing Miniwhoosh’s encrypted and unencrypted modes with the original Whoosh implementation. The evaluation covers document ingestion performance, search latency and memory usage, demonstrating Miniwhoosh’s viability for practical deployment. Our analysis identifies clear performance gains, highlighting that while Miniwhoosh significantly improves indexing speed and maintains comparable search responsiveness, it does introduce memory overhead primarily due to encryption. Overall, this thesis contributes a practical, efficient, and secure document management solution, bridging the gap between robust confidentiality and usability, thereby enabling secure storage and search operations on untrusted servers. |