dumbr-pypi


Namedumbr-pypi JSON
Version 1.13.0 PyPI version JSON
download
home_pagehttps://github.com/chriskuehl/dumb-pypi
Summary
upload_time2023-03-21 16:36:29
maintainer
docs_urlNone
authorChris Kuehl
requires_python>=3.7
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            dumb-pypi
---------

[![Build Status](https://github.com/chriskuehl/dumb-pypi/actions/workflows/ci.yaml/badge.svg)](https://github.com/chriskuehl/dumb-pypi/actions/workflows/ci.yaml)
[![PyPI version](https://badge.fury.io/py/dumb-pypi.svg)](https://pypi.python.org/pypi/dumb-pypi)


`dumb-pypi` is a simple read-only PyPI index server generator, backed entirely
by static files. It is ideal for internal use by organizations that have a
bunch of their own packages which they'd like to make available.

You can view [an example generated repo](https://chriskuehl.github.io/dumb-pypi/test-repo/).


## A rant about static files (and why you should use dumb-pypi)

The main difference between dumb-pypi and other PyPI implementations is that
dumb-pypi has *no server component*. It's just a script that, given a list of
Python package names, generates a bunch of static files which you can serve
from any webserver, or even directly from S3.

There's something magical about being able to serve a package repository
entirely from a tree of static files. It's incredibly easy to make it fast and
highly-available when you don't need to worry about running a bunch of
application servers (which are serving a bunch of read-only queries that could
have just been pre-generated).

Linux distributions have been doing this right for decades. Debian has a system
of hundreds of mirrors, and the entire thing is powered entirely by some fancy
`rsync` commands.

For the maintainer of a PyPI repositry, `dumb-pypi` has some nice properties:

* **File serving is extremely fast.** nginx can serve your static files faster
  than you'd ever need. In practice, there are almost no limits on the number
  of packages or number of versions per package.

* **It's very simple.** There's no complicated WSGI app to deploy, no
  databases, and no caches. You just need to run the script whenever you have
  new packages, and your index server is ready in seconds.

For more about why this design was chosen, see the detailed
[`RATIONALE.md`][rationale] in this repo.


## Usage

To use dumb-pypi, you need two things:

* A script which generates the index. (That's this project!)

* A generic webserver to serve the generated index.

  This part is up to you. For example, you might sync the built index into an
  S3 bucket, and serve it directly from S3. You might run nginx from the built
  index locally.

My recommended high-availability (but still quite simple) deployment is:

* Store all of the packages in S3.

* Have a cronjob (or equivalent) which rebuilds the index based on the packages
  in S3. This is incredibly fast—it would not be unreasonable to do it every
  sixty seconds. After building the index, sync it into a separate S3 bucket.

* Have a webserver (or set of webservers behind a load balancer) running nginx
  (with the config provided below), with the source being that second S3
  bucket.


### Generating static files

First, install `dumb-pypi` somewhere (e.g. into a virtualenv).

By design, dumb-pypi does *not* require you to have the packages available when
building the index. You only need a list of filenames, one per line. For
example:

```
dumb-init-1.1.2.tar.gz
dumb_init-1.2.0-py2.py3-none-manylinux1_x86_64.whl
ocflib-2016.10.31.0.40-py2.py3-none-any.whl
pre_commit-0.9.2.tar.gz
```

You should also know a URL to access these packages (if you serve them from the
same host as the index, it can be a relative URL). For example, it might be
`https://my-pypi-packages.s3.amazonaws.com/` or `../../pool/`.

You can then invoke the script:

```bash
$ dumb-pypi \
    --package-list my-packages \
    --packages-url https://my-pypi-packages.s3.amazonaws.com/ \
    --output-dir my-built-index
```

The built index will be in `my-built-index`. It's now up to you to figure out
how to serve that with a webserver (nginx is a good option — details below!).


#### Additional options for packages

You can extend the capabilities of your registry using the extended JSON input
syntax when providing your package list to dumb-pypi. Instead of using the
format listed above of one filename per line, format your file with one JSON
object per line, like this:

```json
{"filename": "dumb-init-1.1.2.tar.gz", "hash": "md5=<hash>", "requires_dist": ["cfgv"], "requires_python": ">=3.6", "uploaded_by": "ckuehl", "upload_timestamp": 1512539924}
```

The `filename` key is required. All other keys are optional and will be used to
provide additional information in your generated repository. This extended
information can be useful to determine, for example, who uploaded a package.
(Most of this information is useful in the web UI by humans, not by pip.)

Where should you get information about the hash, uploader, etc? That's up to
you—dumb-pypi isn't in the business of storing or calculating this data. If
you're using S3, one easy option is to store it at upload time as [S3
metadata][s3-metadata].


#### Partial rebuild support

If you want to avoid rebuilding your entire registry constantly, you can pass
the `--previous-package-list` (or `--previous-package-list-json`) argument to
dumb-pypi, pointing to the list you used the last time you called dumb-pypi.
Only the files relating to changed packages will be rebuilt, saving you time
and unnecessary I/O.

The previous package list json is available in the output as `packages.json`.


### Recommended nginx config

You can serve the packages from any static webserver (including directly from
S3), but for compatibility with old versions of pip, it's necessary to do a
tiny bit of URL rewriting (see [`RATIONALE.md`][rationale] for full details
about the behavior of various pip versions).

In particular, if you want to support old pip versions, you need to apply this
logic to package names (taken from [PEP 503][pep503]):

```python
def normalize(name):
    return re.sub(r'[-_.]+', '-', name).lower()
```

Here is an example nginx config which supports all versions of pip and
easy_install:

```nginx
server {
    location / {
        root /path/to/index;
        set_by_lua $canonical_uri "return string.gsub(string.lower(ngx.var.uri), '[-_.]+', '-')";
        try_files $uri $uri/index.html $canonical_uri $canonical_uri/index.html =404;
    }
}

```

If you don't care about easy_install or versions of pip prior to 8.1.2, you can
omit the `canonical_uri` hack.


### Using your deployed index server with pip

When running pip, pass `-i https://my-pypi-server/simple` or set the
environment variable `PIP_INDEX_URL=https://my-pypi-server/simple`.


### Known incompatibilities with public PyPI

We try to maintain compatibility with the standard PyPI interface, but there
are some incompatibilities currently which are hard to fix due to dumb-pypi's
design:

* While [both JSON API endpoints][json-api] are supported, many keys in the
  JSON API are not present since they require inspecting packages which
  dumb-pypi can't do. Some of these, like `requires_python` and
  `requires_dist`, can be passed in as JSON.

* The [per-version JSON API endpoint][per-version-api] only includes data about
  the current requested version and not _all_ versions, unlike public PyPI. In
  other words, if you access `/pypi/<package>/1.0.0/json`, you will only see
  the `1.0.0` release under the `releases` key and not every release ever made.
  The regular non-versioned API route (`/pypi/<package>/json`) will have all
  releases.


## Contributing

Thanks for contributing! To get started, run `make venv` and then `.
venv/bin/activate` to source the virtualenv. You should now have a `dumb-pypi`
command on your path using your checked-out version of the code.

To run the tests, call `make test`. To run an individual test, you can do
`pytest -k name_of_test tests` (with the virtualenv activated).


[rationale]: https://github.com/chriskuehl/dumb-pypi/blob/master/RATIONALE.md
[pep503]: https://www.python.org/dev/peps/pep-0503/#normalized-names
[s3-metadata]: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#UserMetadata
[json-api]: https://warehouse.pypa.io/api-reference/json.html
[per-version-api]: https://warehouse.pypa.io/api-reference/json.html#get--pypi--project_name---version--json

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/chriskuehl/dumb-pypi",
    "name": "dumbr-pypi",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "",
    "author": "Chris Kuehl",
    "author_email": "ckuehl@ckuehl.me",
    "download_url": "https://files.pythonhosted.org/packages/af/a5/e3c33dc4a965c72af2b7e0bed281eca6a456bfbc486f4886e25ed8a54243/dumbr_pypi-1.13.0.tar.gz",
    "platform": null,
    "description": "dumb-pypi\n---------\n\n[![Build Status](https://github.com/chriskuehl/dumb-pypi/actions/workflows/ci.yaml/badge.svg)](https://github.com/chriskuehl/dumb-pypi/actions/workflows/ci.yaml)\n[![PyPI version](https://badge.fury.io/py/dumb-pypi.svg)](https://pypi.python.org/pypi/dumb-pypi)\n\n\n`dumb-pypi` is a simple read-only PyPI index server generator, backed entirely\nby static files. It is ideal for internal use by organizations that have a\nbunch of their own packages which they'd like to make available.\n\nYou can view [an example generated repo](https://chriskuehl.github.io/dumb-pypi/test-repo/).\n\n\n## A rant about static files (and why you should use dumb-pypi)\n\nThe main difference between dumb-pypi and other PyPI implementations is that\ndumb-pypi has *no server component*. It's just a script that, given a list of\nPython package names, generates a bunch of static files which you can serve\nfrom any webserver, or even directly from S3.\n\nThere's something magical about being able to serve a package repository\nentirely from a tree of static files. It's incredibly easy to make it fast and\nhighly-available when you don't need to worry about running a bunch of\napplication servers (which are serving a bunch of read-only queries that could\nhave just been pre-generated).\n\nLinux distributions have been doing this right for decades. Debian has a system\nof hundreds of mirrors, and the entire thing is powered entirely by some fancy\n`rsync` commands.\n\nFor the maintainer of a PyPI repositry, `dumb-pypi` has some nice properties:\n\n* **File serving is extremely fast.** nginx can serve your static files faster\n  than you'd ever need. In practice, there are almost no limits on the number\n  of packages or number of versions per package.\n\n* **It's very simple.** There's no complicated WSGI app to deploy, no\n  databases, and no caches. You just need to run the script whenever you have\n  new packages, and your index server is ready in seconds.\n\nFor more about why this design was chosen, see the detailed\n[`RATIONALE.md`][rationale] in this repo.\n\n\n## Usage\n\nTo use dumb-pypi, you need two things:\n\n* A script which generates the index. (That's this project!)\n\n* A generic webserver to serve the generated index.\n\n  This part is up to you. For example, you might sync the built index into an\n  S3 bucket, and serve it directly from S3. You might run nginx from the built\n  index locally.\n\nMy recommended high-availability (but still quite simple) deployment is:\n\n* Store all of the packages in S3.\n\n* Have a cronjob (or equivalent) which rebuilds the index based on the packages\n  in S3. This is incredibly fast\u2014it would not be unreasonable to do it every\n  sixty seconds. After building the index, sync it into a separate S3 bucket.\n\n* Have a webserver (or set of webservers behind a load balancer) running nginx\n  (with the config provided below), with the source being that second S3\n  bucket.\n\n\n### Generating static files\n\nFirst, install `dumb-pypi` somewhere (e.g. into a virtualenv).\n\nBy design, dumb-pypi does *not* require you to have the packages available when\nbuilding the index. You only need a list of filenames, one per line. For\nexample:\n\n```\ndumb-init-1.1.2.tar.gz\ndumb_init-1.2.0-py2.py3-none-manylinux1_x86_64.whl\nocflib-2016.10.31.0.40-py2.py3-none-any.whl\npre_commit-0.9.2.tar.gz\n```\n\nYou should also know a URL to access these packages (if you serve them from the\nsame host as the index, it can be a relative URL). For example, it might be\n`https://my-pypi-packages.s3.amazonaws.com/` or `../../pool/`.\n\nYou can then invoke the script:\n\n```bash\n$ dumb-pypi \\\n    --package-list my-packages \\\n    --packages-url https://my-pypi-packages.s3.amazonaws.com/ \\\n    --output-dir my-built-index\n```\n\nThe built index will be in `my-built-index`. It's now up to you to figure out\nhow to serve that with a webserver (nginx is a good option \u2014 details below!).\n\n\n#### Additional options for packages\n\nYou can extend the capabilities of your registry using the extended JSON input\nsyntax when providing your package list to dumb-pypi. Instead of using the\nformat listed above of one filename per line, format your file with one JSON\nobject per line, like this:\n\n```json\n{\"filename\": \"dumb-init-1.1.2.tar.gz\", \"hash\": \"md5=<hash>\", \"requires_dist\": [\"cfgv\"], \"requires_python\": \">=3.6\", \"uploaded_by\": \"ckuehl\", \"upload_timestamp\": 1512539924}\n```\n\nThe `filename` key is required. All other keys are optional and will be used to\nprovide additional information in your generated repository. This extended\ninformation can be useful to determine, for example, who uploaded a package.\n(Most of this information is useful in the web UI by humans, not by pip.)\n\nWhere should you get information about the hash, uploader, etc? That's up to\nyou\u2014dumb-pypi isn't in the business of storing or calculating this data. If\nyou're using S3, one easy option is to store it at upload time as [S3\nmetadata][s3-metadata].\n\n\n#### Partial rebuild support\n\nIf you want to avoid rebuilding your entire registry constantly, you can pass\nthe `--previous-package-list` (or `--previous-package-list-json`) argument to\ndumb-pypi, pointing to the list you used the last time you called dumb-pypi.\nOnly the files relating to changed packages will be rebuilt, saving you time\nand unnecessary I/O.\n\nThe previous package list json is available in the output as `packages.json`.\n\n\n### Recommended nginx config\n\nYou can serve the packages from any static webserver (including directly from\nS3), but for compatibility with old versions of pip, it's necessary to do a\ntiny bit of URL rewriting (see [`RATIONALE.md`][rationale] for full details\nabout the behavior of various pip versions).\n\nIn particular, if you want to support old pip versions, you need to apply this\nlogic to package names (taken from [PEP 503][pep503]):\n\n```python\ndef normalize(name):\n    return re.sub(r'[-_.]+', '-', name).lower()\n```\n\nHere is an example nginx config which supports all versions of pip and\neasy_install:\n\n```nginx\nserver {\n    location / {\n        root /path/to/index;\n        set_by_lua $canonical_uri \"return string.gsub(string.lower(ngx.var.uri), '[-_.]+', '-')\";\n        try_files $uri $uri/index.html $canonical_uri $canonical_uri/index.html =404;\n    }\n}\n\n```\n\nIf you don't care about easy_install or versions of pip prior to 8.1.2, you can\nomit the `canonical_uri` hack.\n\n\n### Using your deployed index server with pip\n\nWhen running pip, pass `-i https://my-pypi-server/simple` or set the\nenvironment variable `PIP_INDEX_URL=https://my-pypi-server/simple`.\n\n\n### Known incompatibilities with public PyPI\n\nWe try to maintain compatibility with the standard PyPI interface, but there\nare some incompatibilities currently which are hard to fix due to dumb-pypi's\ndesign:\n\n* While [both JSON API endpoints][json-api] are supported, many keys in the\n  JSON API are not present since they require inspecting packages which\n  dumb-pypi can't do. Some of these, like `requires_python` and\n  `requires_dist`, can be passed in as JSON.\n\n* The [per-version JSON API endpoint][per-version-api] only includes data about\n  the current requested version and not _all_ versions, unlike public PyPI. In\n  other words, if you access `/pypi/<package>/1.0.0/json`, you will only see\n  the `1.0.0` release under the `releases` key and not every release ever made.\n  The regular non-versioned API route (`/pypi/<package>/json`) will have all\n  releases.\n\n\n## Contributing\n\nThanks for contributing! To get started, run `make venv` and then `.\nvenv/bin/activate` to source the virtualenv. You should now have a `dumb-pypi`\ncommand on your path using your checked-out version of the code.\n\nTo run the tests, call `make test`. To run an individual test, you can do\n`pytest -k name_of_test tests` (with the virtualenv activated).\n\n\n[rationale]: https://github.com/chriskuehl/dumb-pypi/blob/master/RATIONALE.md\n[pep503]: https://www.python.org/dev/peps/pep-0503/#normalized-names\n[s3-metadata]: https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#UserMetadata\n[json-api]: https://warehouse.pypa.io/api-reference/json.html\n[per-version-api]: https://warehouse.pypa.io/api-reference/json.html#get--pypi--project_name---version--json\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "",
    "version": "1.13.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "aa3ce0ea8cf36e2749c27fcdba066ef787cb2b205e4d7074822debe0716146a7",
                "md5": "e8abdae12f420c1be3b6c8f24851acd5",
                "sha256": "6c0e87661d1bae8883f7bd84f944005a1dd0c5025596d70d47418757436ae31d"
            },
            "downloads": -1,
            "filename": "dumbr_pypi-1.13.0-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8abdae12f420c1be3b6c8f24851acd5",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 15977,
            "upload_time": "2023-03-21T16:36:27",
            "upload_time_iso_8601": "2023-03-21T16:36:27.228876Z",
            "url": "https://files.pythonhosted.org/packages/aa/3c/e0ea8cf36e2749c27fcdba066ef787cb2b205e4d7074822debe0716146a7/dumbr_pypi-1.13.0-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "afa5e3c33dc4a965c72af2b7e0bed281eca6a456bfbc486f4886e25ed8a54243",
                "md5": "a2a1340a571be7ccaa2102751fbc0d35",
                "sha256": "6d4f642f470c399140be4af0ad06470f43fdb101ae99b62a4a7a8d07ef03cc2d"
            },
            "downloads": -1,
            "filename": "dumbr_pypi-1.13.0.tar.gz",
            "has_sig": false,
            "md5_digest": "a2a1340a571be7ccaa2102751fbc0d35",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 17171,
            "upload_time": "2023-03-21T16:36:29",
            "upload_time_iso_8601": "2023-03-21T16:36:29.650515Z",
            "url": "https://files.pythonhosted.org/packages/af/a5/e3c33dc4a965c72af2b7e0bed281eca6a456bfbc486f4886e25ed8a54243/dumbr_pypi-1.13.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-21 16:36:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "chriskuehl",
    "github_project": "dumb-pypi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "dumbr-pypi"
}
        
Elapsed time: 0.05970s