# dirty-waters
Dirty-waters automatically finds software supply chain issues in software projects by analyzing the available metadata of all dependencies, transitively.
Reference: [Dirty-Waters: Detecting Software Supply Chain Smells](http://arxiv.org/pdf/2410.16049), Technical report 2410.16049, arXiv, 2024.
By using `dirty-waters`, you identify the shady areas of your supply chain, which would be natural target for attackers to exploit.
Kinds of problems identified by `dirty-waters`:
- Dependencies with no link to source code repositories (high severity)
- Dependencies with no tag / commit sha for release, impossible to have reproducible builds (high severity)
- Deprecated Dependencies (medium severity)
- Depends on a fork (medium severity)
- Dependencies with no build attestation (low severity)
Additionally, `dirty-waters` gives a supplier view on the dependency trees (who owns the different dependencies?)
`dirty-waters` is developed as part of the [Chains research project](https://chains.proj.kth.se/).
## Installation
To set up `dirty-waters`, follow these steps:
1. Clone the repository:
```bash
git clone https://github.com/chains-project/dirty-waters.git
cd dirty-waters
```
2. Set up a virtual environment and install dependencies:
```bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd tool
```
In alternative to virtual environments, you may also use the Nix flake present in this repository.
3. Set up the GitHub API token (ideally, in a `.env` file):
```bash
export GITHUB_API_TOKEN=<your_token>
```
## Usage
Run the tool using the following command structure:
### Arguments:
```
usage: main.py [-h] -p PROJECT_REPO_NAME -v RELEASE_VERSION_OLD [-vn RELEASE_VERSION_NEW] -s [-d] [-n] -pm {yarn-classic,yarn-berry,pnpm,npm,maven} [--pnpm-scope]
options:
-h, --help show this help message and exit
-p PROJECT_REPO_NAME, --project-repo-name PROJECT_REPO_NAME
Specify the project repository name. Example: MetaMask/metamask-extension
-v RELEASE_VERSION_OLD, --release-version-old RELEASE_VERSION_OLD
The old release tag of the project repository. Example: v10.0.0
-vn RELEASE_VERSION_NEW, --release-version-new RELEASE_VERSION_NEW
The new release version of the project repository.
-s, --static-analysis
Run static analysis and generate a markdown report of the project
-d, --differential-analysis
Run differential analysis and generate a markdown report of the project
-n, --name-match Compare the package names with the name in the in the package.json file. This option will slow down the execution time due to the API rate limit of
code search.
-pm {yarn-classic,yarn-berry,pnpm,npm,maven}, --package-manager {yarn-classic,yarn-berry,pnpm,npm,maven}
The package manager used in the project.
--pnpm-scope Extract dependencies from pnpm with a specific scope using 'pnpm list --filter <scope> --depth Infinity' command. Configure the scope in tool_config.py
file.
```
### Example usage:
1. Static analysis:
```bash
python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry
```
- Example output: [Static Analysis Report Example](example_reports/static_analysis_report_example.md)
2. Differential analysis:
```bash
python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -s -d -pm yarn-berry
```
- Example output: [Differential Analysis Report Example](example_reports/differential_analysis_report_example.md)
Notes:
- `-v` should be the version of GitHub release, e.g. for [this release](https://github.com/MetaMask/metamask-extension/releases/tag/v11.1.0), the value should be `v11.11.0`, not `Version 11.11.0` or `11.11.0`.
- The `-s` flag is required for all analyses.
- When using `-d` for differential analysis, both `-v` and `-vn` must be specified.
## Software Supply Chain Smell Support
`dirty-waters` currently supports package managers within the JavaScript and Java ecosystems. However, due to some constraints associated with the nature of the package managers, the tool may not be able to detect all the smells in the project. The following table shows the supported package managers and their associated smells:
| Package Manager | No Source Code Repository | Invalid Source Code Repository URL | No Release Tag | Deprecated Dependency | Depends on a Fork | No Build Attestation | No/Invalid Code Signature |
| --------------- | ------------------------- | ---------------------------------- | -------------- | --------------------- | ----------------- | -------------------- | ------------------------- |
| Yarn Classic | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Yarn Berry | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Pnpm | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Npm | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Maven | Yes | Yes | Yes | No | Yes | No | Yes |
### Smell Check Options
By default, all supported checks for the given package manager are performed in static analysis.
You can specify individual checks using the following flags (note that if at least one flag
is passed, instead of all checks being performed, only the flagged ones will be):
- `--check-source-code`: Check for dependencies with no link to source code repositories
- `--check-release-tags`: Check for dependencies with no tag/commit sha for release
- `--check-deprecated`: Check for deprecated dependencies
- `--check-forks`: Check for dependencies that are forks
- `--check-provenance`: Check for dependencies with no build attestation
- `--check-code-signature`: Check for dependencies with no/invalid code signature
**Note**: The `--check-release-tags` and `--check-forks` flags require `--check-source-code` to be enabled, as release tags can only be checked if we can first verify the source code repository.
As an example of running specific checks:
```bash
python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry --check-source-code --check-release-tags
```
This run will only check for dependencies with no link to source code repositories and dependencies with no tag/commit sha for release.
For **differential analysis**, it is currently not possible to specify individual checks -- all checks will be performed.
### Notes
#### Inaccessible Tags
Sometimes, the release version specified in a lockfile/pom/similar is not necessarily the same
as the tag used in the repository. This can happen for a variety of reasons. We have
compiled several tag formats which were deemed reasonable to lookup, if the exact tag
specified in the lockfile/pom/similar is not found. They come from a combination of [AROMA](https://dl.acm.org/doi/pdf/10.1145/3643764)'s
work and our own research on this subject.
These formats are the following:
- `<tag>`
- `v<tag>`
- `r-<tag>`
- `release-<tag>`
- `parent-<tag>`
- `<package_name>@<tag>`
- `<package_name>-v<tag>`
- `<package_name>_v<tag>`
- `<package_name>-<tag>`
- `<package_name>_<tag>`
- `release/<tag>`
- `<tag>-release`
- `v.<tag>`
- `p1-p2-p3<tag>`
Note than this does not mean that if `dirty-waters` does not find a tag, it doesn't exist:
it means that it either doesn't exist, or that its format is not one of the above.
This list may be expanded in the future. If you feel that a relevant format is missing, please
open an issue and/or a pull request!
## Academic Work
- [Dirty-Waters: Detecting Software Supply Chain Smells](https://arxiv.org/abs/2410.16049)
## Other issues not handled by dirty-waters
- Missing dependencies: simply run mvn/pip/... install :)
- Bloated dependencies: we recommend [DepClean](https://github.com/ASSERT-KTH/depclean) for Java, [depcheck](https://github.com/depcheck/depcheck) for NPM
- Version constraint inconsistencies: we recommend [pipdeptree](https://github.com/tox-dev/pipdeptree) for Python
## License
MIT License.
Raw data
{
"_id": null,
"home_page": null,
"name": "dirty-waters",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.12",
"maintainer_email": null,
"keywords": "software supply chain, ssc, dependencies, npm",
"author": null,
"author_email": "Raphina Liu <raphina@kth.se>, Diogo Gaspar <dgaspar@kth.se>, Martin Monperrus <monperrus@kth.se>",
"download_url": "https://files.pythonhosted.org/packages/f2/4d/e6ef294b3b8bd7d8bab3feee2fec9ac7ef3d4bae3f0d41e9bb93dedd88ab/dirty_waters-0.33.0.tar.gz",
"platform": null,
"description": "# dirty-waters\n\nDirty-waters automatically finds software supply chain issues in software projects by analyzing the available metadata of all dependencies, transitively.\n\nReference: [Dirty-Waters: Detecting Software Supply Chain Smells](http://arxiv.org/pdf/2410.16049), Technical report 2410.16049, arXiv, 2024.\n\nBy using `dirty-waters`, you identify the shady areas of your supply chain, which would be natural target for attackers to exploit.\n\nKinds of problems identified by `dirty-waters`:\n\n- Dependencies with no link to source code repositories (high severity)\n- Dependencies with no tag / commit sha for release, impossible to have reproducible builds (high severity)\n- Deprecated Dependencies (medium severity)\n- Depends on a fork (medium severity)\n- Dependencies with no build attestation (low severity)\n\nAdditionally, `dirty-waters` gives a supplier view on the dependency trees (who owns the different dependencies?)\n\n`dirty-waters` is developed as part of the [Chains research project](https://chains.proj.kth.se/).\n\n## Installation\n\nTo set up `dirty-waters`, follow these steps:\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/chains-project/dirty-waters.git\ncd dirty-waters\n```\n\n2. Set up a virtual environment and install dependencies:\n\n```bash\npython3 -m venv venv\nsource venv/bin/activate\npip install -r requirements.txt\ncd tool\n```\n\nIn alternative to virtual environments, you may also use the Nix flake present in this repository.\n\n3. Set up the GitHub API token (ideally, in a `.env` file):\n\n```bash\nexport GITHUB_API_TOKEN=<your_token>\n```\n\n## Usage\n\nRun the tool using the following command structure:\n\n### Arguments:\n\n```\nusage: main.py [-h] -p PROJECT_REPO_NAME -v RELEASE_VERSION_OLD [-vn RELEASE_VERSION_NEW] -s [-d] [-n] -pm {yarn-classic,yarn-berry,pnpm,npm,maven} [--pnpm-scope]\n\noptions:\n -h, --help show this help message and exit\n -p PROJECT_REPO_NAME, --project-repo-name PROJECT_REPO_NAME\n Specify the project repository name. Example: MetaMask/metamask-extension\n -v RELEASE_VERSION_OLD, --release-version-old RELEASE_VERSION_OLD\n The old release tag of the project repository. Example: v10.0.0\n -vn RELEASE_VERSION_NEW, --release-version-new RELEASE_VERSION_NEW\n The new release version of the project repository.\n -s, --static-analysis\n Run static analysis and generate a markdown report of the project\n -d, --differential-analysis\n Run differential analysis and generate a markdown report of the project\n -n, --name-match Compare the package names with the name in the in the package.json file. This option will slow down the execution time due to the API rate limit of\n code search.\n -pm {yarn-classic,yarn-berry,pnpm,npm,maven}, --package-manager {yarn-classic,yarn-berry,pnpm,npm,maven}\n The package manager used in the project.\n --pnpm-scope Extract dependencies from pnpm with a specific scope using 'pnpm list --filter <scope> --depth Infinity' command. Configure the scope in tool_config.py\n file.\n```\n\n### Example usage:\n\n1. Static analysis:\n\n```bash\npython3 main.py -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry\n```\n\n- Example output: [Static Analysis Report Example](example_reports/static_analysis_report_example.md)\n\n2. Differential analysis:\n\n```bash\npython3 main.py -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -s -d -pm yarn-berry\n```\n\n- Example output: [Differential Analysis Report Example](example_reports/differential_analysis_report_example.md)\n\nNotes:\n\n- `-v` should be the version of GitHub release, e.g. for [this release](https://github.com/MetaMask/metamask-extension/releases/tag/v11.1.0), the value should be `v11.11.0`, not `Version 11.11.0` or `11.11.0`.\n- The `-s` flag is required for all analyses.\n- When using `-d` for differential analysis, both `-v` and `-vn` must be specified.\n\n## Software Supply Chain Smell Support\n\n`dirty-waters` currently supports package managers within the JavaScript and Java ecosystems. However, due to some constraints associated with the nature of the package managers, the tool may not be able to detect all the smells in the project. The following table shows the supported package managers and their associated smells:\n\n| Package Manager | No Source Code Repository | Invalid Source Code Repository URL | No Release Tag | Deprecated Dependency | Depends on a Fork | No Build Attestation | No/Invalid Code Signature |\n| --------------- | ------------------------- | ---------------------------------- | -------------- | --------------------- | ----------------- | -------------------- | ------------------------- |\n| Yarn Classic | Yes | Yes | Yes | Yes | Yes | Yes | Yes |\n| Yarn Berry | Yes | Yes | Yes | Yes | Yes | Yes | Yes |\n| Pnpm | Yes | Yes | Yes | Yes | Yes | Yes | Yes |\n| Npm | Yes | Yes | Yes | Yes | Yes | Yes | Yes |\n| Maven | Yes | Yes | Yes | No | Yes | No | Yes |\n\n### Smell Check Options\n\nBy default, all supported checks for the given package manager are performed in static analysis.\nYou can specify individual checks using the following flags (note that if at least one flag\nis passed, instead of all checks being performed, only the flagged ones will be):\n\n- `--check-source-code`: Check for dependencies with no link to source code repositories\n- `--check-release-tags`: Check for dependencies with no tag/commit sha for release\n- `--check-deprecated`: Check for deprecated dependencies\n- `--check-forks`: Check for dependencies that are forks\n- `--check-provenance`: Check for dependencies with no build attestation\n- `--check-code-signature`: Check for dependencies with no/invalid code signature\n\n**Note**: The `--check-release-tags` and `--check-forks` flags require `--check-source-code` to be enabled, as release tags can only be checked if we can first verify the source code repository.\n\nAs an example of running specific checks:\n\n```bash\npython3 main.py -p MetaMask/metamask-extension -v v11.11.0 -s -pm yarn-berry --check-source-code --check-release-tags\n```\n\nThis run will only check for dependencies with no link to source code repositories and dependencies with no tag/commit sha for release.\n\nFor **differential analysis**, it is currently not possible to specify individual checks -- all checks will be performed.\n\n### Notes\n\n#### Inaccessible Tags\n\nSometimes, the release version specified in a lockfile/pom/similar is not necessarily the same\nas the tag used in the repository. This can happen for a variety of reasons. We have\ncompiled several tag formats which were deemed reasonable to lookup, if the exact tag\nspecified in the lockfile/pom/similar is not found. They come from a combination of [AROMA](https://dl.acm.org/doi/pdf/10.1145/3643764)'s\nwork and our own research on this subject.\nThese formats are the following:\n\n- `<tag>`\n- `v<tag>`\n- `r-<tag>`\n- `release-<tag>`\n- `parent-<tag>`\n- `<package_name>@<tag>`\n- `<package_name>-v<tag>`\n- `<package_name>_v<tag>`\n- `<package_name>-<tag>`\n- `<package_name>_<tag>`\n- `release/<tag>`\n- `<tag>-release`\n- `v.<tag>`\n- `p1-p2-p3<tag>`\n\nNote than this does not mean that if `dirty-waters` does not find a tag, it doesn't exist:\nit means that it either doesn't exist, or that its format is not one of the above.\n\nThis list may be expanded in the future. If you feel that a relevant format is missing, please\nopen an issue and/or a pull request!\n\n## Academic Work\n\n- [Dirty-Waters: Detecting Software Supply Chain Smells](https://arxiv.org/abs/2410.16049)\n\n## Other issues not handled by dirty-waters\n\n- Missing dependencies: simply run mvn/pip/... install :)\n- Bloated dependencies: we recommend [DepClean](https://github.com/ASSERT-KTH/depclean) for Java, [depcheck](https://github.com/depcheck/depcheck) for NPM\n- Version constraint inconsistencies: we recommend [pipdeptree](https://github.com/tox-dev/pipdeptree) for Python\n\n## License\n\nMIT License.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Automatically detect software supply chain smells and issues",
"version": "0.33.0",
"project_urls": {
"Bug Tracker": "https://github.com/chains-project/dirty-waters/issues",
"Homepage": "https://github.com/chains-project/dirty-waters"
},
"split_keywords": [
"software supply chain",
" ssc",
" dependencies",
" npm"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "7e42e84d602eb9059f9211a8bb15f29ca5df529b03dfe97cde9dbb4f273fcf17",
"md5": "dc0ce06a6f8b6113ab54858803999a95",
"sha256": "d13f29781f25d19d41542062d8e8ed8eb18ffadafed9ec9f9085c5d516a2446f"
},
"downloads": -1,
"filename": "dirty_waters-0.33.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "dc0ce06a6f8b6113ab54858803999a95",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.12",
"size": 41278,
"upload_time": "2025-01-16T10:31:53",
"upload_time_iso_8601": "2025-01-16T10:31:53.855379Z",
"url": "https://files.pythonhosted.org/packages/7e/42/e84d602eb9059f9211a8bb15f29ca5df529b03dfe97cde9dbb4f273fcf17/dirty_waters-0.33.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f24de6ef294b3b8bd7d8bab3feee2fec9ac7ef3d4bae3f0d41e9bb93dedd88ab",
"md5": "25d151bc426ef47986c5b298fb97e322",
"sha256": "c41b4bc5ad0a8b9040672dbcb96df1238a81f1f652a7d945e0ebee1c8a070ac0"
},
"downloads": -1,
"filename": "dirty_waters-0.33.0.tar.gz",
"has_sig": false,
"md5_digest": "25d151bc426ef47986c5b298fb97e322",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.12",
"size": 35895,
"upload_time": "2025-01-16T10:31:55",
"upload_time_iso_8601": "2025-01-16T10:31:55.910804Z",
"url": "https://files.pythonhosted.org/packages/f2/4d/e6ef294b3b8bd7d8bab3feee2fec9ac7ef3d4bae3f0d41e9bb93dedd88ab/dirty_waters-0.33.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-16 10:31:55",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "chains-project",
"github_project": "dirty-waters",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "attrs",
"specs": [
[
"==",
"24.2.0"
]
]
},
{
"name": "cattrs",
"specs": [
[
"==",
"24.1.2"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2024.8.30"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.4.0"
]
]
},
{
"name": "exceptiongroup",
"specs": [
[
"==",
"1.2.2"
]
]
},
{
"name": "GitPython",
"specs": [
[
"==",
"3.1.43"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.10"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"2.1.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.3"
]
]
},
{
"name": "platformdirs",
"specs": [
[
"==",
"4.3.6"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.2"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.32.3"
]
]
},
{
"name": "requests-cache",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "tabulate",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.5"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.12.2"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2024.2"
]
]
},
{
"name": "url-normalize",
"specs": [
[
"==",
"1.4.3"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.2.3"
]
]
}
],
"lcname": "dirty-waters"
}