pypi-package-rot


Namepypi-package-rot JSON
Version 0.1.0 PyPI version JSON
download
home_pagehttps://github.com/LucaCappelletti94/pypi-package-rot
SummaryTools to investigate the state of PyPI packages.
upload_time2024-11-03 11:23:06
maintainerNone
docs_urlNone
authorLucaCappelletti94
requires_python>=3.9
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # PyPI package rot

[![PyPI](https://badge.fury.io/py/pypi-package-rot.svg)](https://badge.fury.io/py/pypi-package-rot)
[![Downloads](https://pepy.tech/badge/pypi-package-rot)](https://pepy.tech/badge/pypi-package-rot)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/LucaCappelletti94/pypi-package-rot/blob/master/LICENSE)
[![CI](https://github.com/LucaCappelletti94/pypi-package-rot/actions/workflows/python.yml/badge.svg)](https://github.com/LucaCappelletti94/pypi-package-rot/actions)

Investigating the state of package rot on PyPI.

## Introduction

The objective of this dataset is to provide a screenshot of the state of PyPI packages, so to facilitate investigations on the state of the Python ecosystem. While its first intended goal is to facilitate the identification of package rot, it can be used for other purposes as well.

**At this time, we are building the dataset still and we are about 200k packages of 600k. We expect to have the dataset ready by the end of 2024.**

## Installing

As usual, you can install the package with pip:

```bash
pip install pypi-package-rot
```

## CLI

The package provides the following CLI utilities to allow you to replicate the dataset. Do keep in mind that this operation will take a long time, as it involves scraping informations from websites that allow about 1 request per second.

### Perpetual scraper

This command will scrape PyPI packages and store their metadata in the cache directory. Since the API RATE allows only about 1 request per second, it will take a long time (currently about 8 days) to retrieve all the packages. This utility will run in perpetuity, running and retrieving new packages continuously.

The email is needed to build an adequate user-agent string for the requests, so that PyPI can contact you if you are abusing the API.

```bash
pypi_package_rot perpetual_scraper --email "your@email.com"
```

### Build the dataset

After having downloaded for a while the metadata of the packages, you can build the summarized anonymized dataset with the following command:

```bash
pypi_package_rot perpetual_builder --verbose --output "pypi.rot.v1.summarized.csv"  --email "your@email.com"
```

Alternatively, to include all informations, you can use the following command:

```bash
pypi_package_rot perpetual_builder --verbose --full --output "pypi.rot.v1.full.json"  --email "your@email.com"
```

Part of the procedure included testing whether the URLs in the metadata are still valid. This is done by sending a HEAD request to the URL and checking the status code. Such operations may take a long time (about one month at the time of writing). As per the previous command, the email is needed to build an adequate user-agent string for the requests, so that websites can contact you if you are abusing their services.

## Contributing

Do you have some ideas to improve this dataset or associated analysis? Feel free to open an issue or a pull request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/LucaCappelletti94/pypi-package-rot",
    "name": "pypi-package-rot",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "LucaCappelletti94",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/17/d6/4b228f530377f57f6448c730572bc5bd319312e60698159c5e53588e56df/pypi_package_rot-0.1.0.tar.gz",
    "platform": null,
    "description": "# PyPI package rot\n\n[![PyPI](https://badge.fury.io/py/pypi-package-rot.svg)](https://badge.fury.io/py/pypi-package-rot)\n[![Downloads](https://pepy.tech/badge/pypi-package-rot)](https://pepy.tech/badge/pypi-package-rot)\n[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/LucaCappelletti94/pypi-package-rot/blob/master/LICENSE)\n[![CI](https://github.com/LucaCappelletti94/pypi-package-rot/actions/workflows/python.yml/badge.svg)](https://github.com/LucaCappelletti94/pypi-package-rot/actions)\n\nInvestigating the state of package rot on PyPI.\n\n## Introduction\n\nThe objective of this dataset is to provide a screenshot of the state of PyPI packages, so to facilitate investigations on the state of the Python ecosystem. While its first intended goal is to facilitate the identification of package rot, it can be used for other purposes as well.\n\n**At this time, we are building the dataset still and we are about 200k packages of 600k. We expect to have the dataset ready by the end of 2024.**\n\n## Installing\n\nAs usual, you can install the package with pip:\n\n```bash\npip install pypi-package-rot\n```\n\n## CLI\n\nThe package provides the following CLI utilities to allow you to replicate the dataset. Do keep in mind that this operation will take a long time, as it involves scraping informations from websites that allow about 1 request per second.\n\n### Perpetual scraper\n\nThis command will scrape PyPI packages and store their metadata in the cache directory. Since the API RATE allows only about 1 request per second, it will take a long time (currently about 8 days) to retrieve all the packages. This utility will run in perpetuity, running and retrieving new packages continuously.\n\nThe email is needed to build an adequate user-agent string for the requests, so that PyPI can contact you if you are abusing the API.\n\n```bash\npypi_package_rot perpetual_scraper --email \"your@email.com\"\n```\n\n### Build the dataset\n\nAfter having downloaded for a while the metadata of the packages, you can build the summarized anonymized dataset with the following command:\n\n```bash\npypi_package_rot perpetual_builder --verbose --output \"pypi.rot.v1.summarized.csv\"  --email \"your@email.com\"\n```\n\nAlternatively, to include all informations, you can use the following command:\n\n```bash\npypi_package_rot perpetual_builder --verbose --full --output \"pypi.rot.v1.full.json\"  --email \"your@email.com\"\n```\n\nPart of the procedure included testing whether the URLs in the metadata are still valid. This is done by sending a HEAD request to the URL and checking the status code. Such operations may take a long time (about one month at the time of writing). As per the previous command, the email is needed to build an adequate user-agent string for the requests, so that websites can contact you if you are abusing their services.\n\n## Contributing\n\nDo you have some ideas to improve this dataset or associated analysis? Feel free to open an issue or a pull request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Tools to investigate the state of PyPI packages.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://github.com/LucaCappelletti94/pypi-package-rot"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "17d64b228f530377f57f6448c730572bc5bd319312e60698159c5e53588e56df",
                "md5": "84ea9d017fb25d4ddf600c65bcd9e081",
                "sha256": "d9c68064eeb6c838fa2a252c61fc6953fd1161559c45f8ada03e712c884a860b"
            },
            "downloads": -1,
            "filename": "pypi_package_rot-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "84ea9d017fb25d4ddf600c65bcd9e081",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 12754,
            "upload_time": "2024-11-03T11:23:06",
            "upload_time_iso_8601": "2024-11-03T11:23:06.246795Z",
            "url": "https://files.pythonhosted.org/packages/17/d6/4b228f530377f57f6448c730572bc5bd319312e60698159c5e53588e56df/pypi_package_rot-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-03 11:23:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "LucaCappelletti94",
    "github_project": "pypi-package-rot",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pypi-package-rot"
}
        
Elapsed time: 1.04371s