Using a cron-job, we monitor the updates.xml file on PyPI. For each item we take the name, version number, description and pubDate and add the package to our database. Then we go over the newly added packages and try to fetch the JSON file describing them in details from https://pypi.org/pypi/NAME/json.
We extract a lot of fields from the JSON file. For example maintainer, author, description, license and many more. The keywords fields is split by comma , or by space and saved as split_keywords.We also try to extract the link to the Version Control System (VCS) used for the development of the package. For this we rely on the home_page field. Currently, we only check if this leads to a project on GitHub.
See also: npms.io.
PyDigger has similarities, but it does not try to say one package is of higher quality than the other. It does not even indicate any kwalitee. PyDigger tries to offer a view of the Python packages on PyPI. It provides information on "common practices" and also tries to hint at which are better practices or the "best practices".
So what kind of things do we look at?
In the first part of the project we look at the JSON file provided by PyPI and check things there. For example we check if there is a license and what that license is. Then at one point we'll recommend what kind of values should be in that field and what should project developers do with the other values.
We also look at the keywords that were supplied with the project. Having these keywords can make it easier for people to find related packages. On the other hand having too many keywords might be spamming the system.