You are hereClonewise - Automatically Identifying Package Clones and Inferring Security Vulnerabilities
Clonewise - Automatically Identifying Package Clones and Inferring Security Vulnerabilities
The Clonewise service can be used through the submission page.
Developers of software sometimes embed code from other projects. They statically link against an external library, maintain an internal copy of an external library’s source code, or fork the development of an external library. A canonical example is the zlib compression library which is embedded in much software due to its functionality and permissive software license. In general, embedding software is considered a bad development practice, but the reasons for doing so include reducing external dependencies for installation, or modifying functionality of an external library. The practice of embedding code is generally ill advised because it has implications on software maintenance and software security. It is a security problem because at least two versions of the same software exist when it is embedded in another package. Therefore, bug fixes and security patches must be integrated for each specific instance instead of being applied once to a system wide library. Because of these issues, for most Linux vendors, package policies exist that oppose the embedding of code, unless specific exceptions are required.
In the example of zlib, each time a vulnerability was discovered in the original upstream source, all embedded copies required patching. However, uncertainty existed in Linux distributions of which packages were embedding zlib and which packages required patching. In 2005, after a zlib vulnerability was reported, Debian Linux made a specific project to perform binary signature scans against packages in the repository to find vulnerable versions of the embedded library. To create a signature the source code of zlib was manually inspected to find a version string that uniquely identified it. This manual approach still finds vulnerable embedded versions of software today. We constructed signatures for vulnerable versions of compression and image processing libraries including bzip2, libtiff, and libpng. We performed a scan of the Debian and Fedora Linux package repository and found 5 packages with previously unknown vulnerabilities. Even for actively developed projects such as the Mozilla Firefox web browser, we saw windows of exploitability between upstream security fixes and the correction of embedded copies of the image processing libraries. Even in mainstream applications such as Firefox, these windows of opportunity sometimes extended for periods of over 3 months.
This approach of manual searching for embedded copies of specific libraries deals poorly with the scale of the problem. According to the list of tracked embedded packages in Debian Linux, there are over 420 packages which are embedded in other software in the repository. This list was created manually and our results show that it is incomplete. Other Linux vendors were not even tracking embedded copies before our research supplied them with relevant data. It is evident from this that an automated approach is needed for identifying embedded packages without prior knowledge of which packages to search for. This would aid security teams in performing audits on new vulnerabilities in upstream sources. Moreover, correlating embedded package relationships to known vulnerabilities would provide vendors with actionable patch strategies without a full audit of their sometimes massive package repository.
This project is open source and available to download from Github.
The relationships between Fedora packages are shown below. A package is represented by a node in the graph, and edges represent that the packages have a clone shared between them. It is clear there are many clones!
- Silvio Cesare, Ruxcon, "Automated Detection of Software Bugs and Vulnerabilities in Linux", 2011. [slides]
- Silvio Cesare, Ruxmon, "Simple Bugs and Vulnerabilities in Linux Distributions, 2011. [slides]