Description
In light of recent legislation and software supply chain attacks, Software Bills of
Materials (SBOMs) are emerging as a prominent approach to improving software
supply chain security. Prior studies of automatic SBOM generation tools reveal notable
differences in their outputs and limitations in the correctness of their outputs. However,
no research exists that analyzes the performance of such tools in the Python ecosystem
on a large scale for real-world projects. In this thesis, we construct a sample of 9,038
GitHub Python repositories and use four popular SBOM generation tools to generate
SBOMs for them, combining an analysis of their workflows with a large-scale differential
comparison of their outputs. We compare the listed components and dependency
relationships across all generated SBOMs, identifying inter-tool inconsistencies and
tool weaknesses as well as challenges arising from the Python ecosystem, particularly
around missing and inconsistent dependency metadata. Based on these findings,
we propose best practices for SBOM generation in Python and provide guidance
for practitioners on using current tools effectively to support software supply chain
security.
|