About PyDigger


Collection process

Using a cron-job, we monitor the updates.xml file on PyPI. For each item we take the name, version number, description and pubDate and add the package to our database. Then we go over the newly added packages and try to fetch the JSON file describing them in details from https://pypi.org/pypi/NAME/json.

We extract a lot of fields from the JSON file. For example maintainer, author, description, license and many more. The keywords fields is split by comma , or by space and saved as split_keywords.

We also try to extract the link to the Version Control System (VCS) used for the development of the package. For this we rely on the home_page field. Currently, we only check if this leads to a project on GitHub.

About

Once upon a time there was service called PyCheesecake that measured certain "kwalitee" metrics of PyPI packages trying to hint at the quality of those packages. The idea was based on the "kwalitee" metrics of the Perl community offered by the CPANTS service.

See also: npms.io.

PyDigger has similarities, but it does not try to say one package is of higher quality than the other. It does not even indicate any kwalitee. PyDigger tries to offer a view of the Python packages on PyPI. It provides information on "common practices" and also tries to hint at which are better practices or the "best practices".

So what kind of things do we look at?

In the first part of the project we look at the JSON file provided by PyPI and check things there. For example we check if there is a license and what that license is. Then at one point we'll recommend what kind of values should be in that field and what should project developers do with the other values.

We also look at the keywords that were supplied with the project. Having these keywords can make it easier for people to find related packages. On the other hand having too many keywords might be spamming the system.

Removing a package

Currently there is no automatic way to remove a package as that would require some trigger from pypi or that I go over all the packages once in a while to see if has been remove from pypi. If you have remove a package from pypi and would like to remove it from here as well, let me know. Either by a friendly e-mail or a friendly issue on the GitHub repository.

Author and source

Maintained by Gabor Szabo in this GitHub repository.

Stats

On the stats page you can see some information on various metrics. The information is extracted from the JSON file provided by PyPI, and if it has a link to the GitHub repository of the project then further data is collected from the GitHub repository.

Version Control - GitHub

Information about the Version Control System used for the project is extracted from the home_page field of the JSON file provided by PyPI. Currently it can only identify GitHub.

Tools

A bunch of tools that might be useful.

Testing libraries

lint and static analysis

Other tools

Elapsed time: 0.00029s