preloaded


Namepreloaded JSON
Version 1.20230516.122456 PyPI version JSON
download
home_pagehttps://github.com/albertz/python-preloaded
SummaryPython Preloaded - Bundle Python executable with preloaded modules
upload_time2023-05-16 12:26:18
maintainer
docs_urlNone
authorAlbert Zeyer
requires_python
licenseMIT License
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Python Preloaded

Project repo: https://github.com/albertz/python-preloaded

Problem:

The startup time of CPython including
loading big libraries like PyTorch or TensorFlow is too slow.
In case of slow file systems, I have seen startup times including such import
of 10-20 seconds.

Very simple idea:

Keep the state of CPython
right after we imported the big libraries
and make it available instantly when needed.
When loading the state,
we can continue to run any random Python script
(we can use [runpy](https://docs.python.org/3/library/runpy.html)).


## Installation

```
pip install preloaded
```

Now you should be able to run `py-preloaded-bundle-fork-server.py`.
For example usage, see the example below.


## Method 1: Fork server

Start CPython and import the libraries.
Then keep the process running as a fork server.
Whenever a new instance it needed, we make a fork (`os.fork`),
and apply a similar logic as [reptyr](https://github.com/nelhage/reptyr).
Some technical details are [here](https://github.com/albertz/python-preloaded/blob/main/docs/pty-details.md).

This solution is very portable across Unix.
I tested it so far on Linux and MacOSX,
but it should run on most other Unixes as well.

### Example

Create the starter script `python-tf.bin`:
```
$ py-preloaded-bundle-fork-server.py tensorflow -o python-tf.bin
```
This starter script is supposed to be a dropin replacement to `python` itself.

For testing, there is `demo-import-tensorflow.py`, with only the following content:
```python
import tensorflow as tf
print("TF:", tf.__version__)
```

Now try to run it directly, and measure the time: 
```
$ time python3 demo-import-tensorflow.py
TF: 2.3.0

________________________________________________________
Executed in    8.31 secs    fish           external
   usr time    3.39 secs  278.00 micros    3.39 secs
   sys time    0.67 secs   83.00 micros    0.67 secs
```
This is on a slow filesystem, NFS specifically.
This is already after the files are cached (I just ran the same command immediately before).
Otherwise, the startup time is even over 14 seconds.

The starter script was not run yet, so the first start is just as slow:
```
$ time ./python-tf.bin demo-import-tensorflow.py
Existing socket but can not connect: [Errno 111] Connection refused
Import module: tensorflow
TF: 2.3.0

________________________________________________________
Executed in    8.35 secs    fish           external
   usr time    3.19 secs  768.00 micros    3.19 secs
   sys time    0.72 secs  228.00 micros    0.72 secs
```

Now it is running in the background.
It is in no way fixed to `demo-import-tensorflow.py`
but could also run any other script now.
However, we continue the demo with the same script:
```
$ time ./python-tf.bin demo-import-tensorflow.py
Existing socket, connected
Open new PTY
Send PTY fd to server
Wait for server to be ready
Entering PTY proxy loop
TF: 2.3.0

________________________________________________________
Executed in  261.56 millis    fish           external
   usr time   64.24 millis  542.00 micros   63.70 millis
   sys time   33.59 millis  163.00 micros   33.43 millis
```
As you see, the startup time is now very fast.
This is also just as fast when executed at a later time,
when the files are not cached anymore.

Interactively test the starter script environment:
```
$ ./python-tf.bin -m IPython
```


## Method 2: Process pool

We always keep some pool (e.g. N=10 instances)
of CPython + preloaded libraries alive in the background,
and once we need a new instance, we just pick one from the pool.

This shares a lot of logic with the fork server.
The main difference basically is that we use `subprocess.Popen` instead of `os.fork`.

(Currently not implemented)


## Method 3: Program checkpoint on disk

Use some checkpointing tool ([CRIU](https://criu.org/)) to store the state of CPython
right after we imported the libraries.
Then later we can load this checkpoint (very fast).

CRIU currently needs root access for dump/restore.
However, there is ongoing work to support a non-root option in https://github.com/checkpoint-restore/criu/pull/1930.

Or maybe [DMTCP](https://github.com/dmtcp/dmtcp/) is a better alternative to CRIU?

(Currently incomplete)


# Related work

https://github.com/gdb/pyseidon

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/albertz/python-preloaded",
    "name": "preloaded",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Albert Zeyer",
    "author_email": "albzey@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2e/3d/54a88621d86f54a29fa9a0a123f6d36db924cb67529ae081603b1cd9ab30/preloaded-1.20230516.122456.tar.gz",
    "platform": null,
    "description": "# Python Preloaded\n\nProject repo: https://github.com/albertz/python-preloaded\n\nProblem:\n\nThe startup time of CPython including\nloading big libraries like PyTorch or TensorFlow is too slow.\nIn case of slow file systems, I have seen startup times including such import\nof 10-20 seconds.\n\nVery simple idea:\n\nKeep the state of CPython\nright after we imported the big libraries\nand make it available instantly when needed.\nWhen loading the state,\nwe can continue to run any random Python script\n(we can use [runpy](https://docs.python.org/3/library/runpy.html)).\n\n\n## Installation\n\n```\npip install preloaded\n```\n\nNow you should be able to run `py-preloaded-bundle-fork-server.py`.\nFor example usage, see the example below.\n\n\n## Method 1: Fork server\n\nStart CPython and import the libraries.\nThen keep the process running as a fork server.\nWhenever a new instance it needed, we make a fork (`os.fork`),\nand apply a similar logic as [reptyr](https://github.com/nelhage/reptyr).\nSome technical details are [here](https://github.com/albertz/python-preloaded/blob/main/docs/pty-details.md).\n\nThis solution is very portable across Unix.\nI tested it so far on Linux and MacOSX,\nbut it should run on most other Unixes as well.\n\n### Example\n\nCreate the starter script `python-tf.bin`:\n```\n$ py-preloaded-bundle-fork-server.py tensorflow -o python-tf.bin\n```\nThis starter script is supposed to be a dropin replacement to `python` itself.\n\nFor testing, there is `demo-import-tensorflow.py`, with only the following content:\n```python\nimport tensorflow as tf\nprint(\"TF:\", tf.__version__)\n```\n\nNow try to run it directly, and measure the time: \n```\n$ time python3 demo-import-tensorflow.py\nTF: 2.3.0\n\n________________________________________________________\nExecuted in    8.31 secs    fish           external\n   usr time    3.39 secs  278.00 micros    3.39 secs\n   sys time    0.67 secs   83.00 micros    0.67 secs\n```\nThis is on a slow filesystem, NFS specifically.\nThis is already after the files are cached (I just ran the same command immediately before).\nOtherwise, the startup time is even over 14 seconds.\n\nThe starter script was not run yet, so the first start is just as slow:\n```\n$ time ./python-tf.bin demo-import-tensorflow.py\nExisting socket but can not connect: [Errno 111] Connection refused\nImport module: tensorflow\nTF: 2.3.0\n\n________________________________________________________\nExecuted in    8.35 secs    fish           external\n   usr time    3.19 secs  768.00 micros    3.19 secs\n   sys time    0.72 secs  228.00 micros    0.72 secs\n```\n\nNow it is running in the background.\nIt is in no way fixed to `demo-import-tensorflow.py`\nbut could also run any other script now.\nHowever, we continue the demo with the same script:\n```\n$ time ./python-tf.bin demo-import-tensorflow.py\nExisting socket, connected\nOpen new PTY\nSend PTY fd to server\nWait for server to be ready\nEntering PTY proxy loop\nTF: 2.3.0\n\n________________________________________________________\nExecuted in  261.56 millis    fish           external\n   usr time   64.24 millis  542.00 micros   63.70 millis\n   sys time   33.59 millis  163.00 micros   33.43 millis\n```\nAs you see, the startup time is now very fast.\nThis is also just as fast when executed at a later time,\nwhen the files are not cached anymore.\n\nInteractively test the starter script environment:\n```\n$ ./python-tf.bin -m IPython\n```\n\n\n## Method 2: Process pool\n\nWe always keep some pool (e.g. N=10 instances)\nof CPython + preloaded libraries alive in the background,\nand once we need a new instance, we just pick one from the pool.\n\nThis shares a lot of logic with the fork server.\nThe main difference basically is that we use `subprocess.Popen` instead of `os.fork`.\n\n(Currently not implemented)\n\n\n## Method 3: Program checkpoint on disk\n\nUse some checkpointing tool ([CRIU](https://criu.org/)) to store the state of CPython\nright after we imported the libraries.\nThen later we can load this checkpoint (very fast).\n\nCRIU currently needs root access for dump/restore.\nHowever, there is ongoing work to support a non-root option in https://github.com/checkpoint-restore/criu/pull/1930.\n\nOr maybe [DMTCP](https://github.com/dmtcp/dmtcp/) is a better alternative to CRIU?\n\n(Currently incomplete)\n\n\n# Related work\n\nhttps://github.com/gdb/pyseidon\n",
    "bugtrack_url": null,
    "license": "MIT License",
    "summary": "Python Preloaded - Bundle Python executable with preloaded modules",
    "version": "1.20230516.122456",
    "project_urls": {
        "Homepage": "https://github.com/albertz/python-preloaded"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e3d54a88621d86f54a29fa9a0a123f6d36db924cb67529ae081603b1cd9ab30",
                "md5": "d4fcb795c4aee7a92f9d4b5ecefa0109",
                "sha256": "87afc70a58e5e85fde43e59908973eb390298caa68ec7328db9505bb5378c496"
            },
            "downloads": -1,
            "filename": "preloaded-1.20230516.122456.tar.gz",
            "has_sig": false,
            "md5_digest": "d4fcb795c4aee7a92f9d4b5ecefa0109",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 10928,
            "upload_time": "2023-05-16T12:26:18",
            "upload_time_iso_8601": "2023-05-16T12:26:18.269527Z",
            "url": "https://files.pythonhosted.org/packages/2e/3d/54a88621d86f54a29fa9a0a123f6d36db924cb67529ae081603b1cd9ab30/preloaded-1.20230516.122456.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-16 12:26:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "albertz",
    "github_project": "python-preloaded",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "preloaded"
}
        
Elapsed time: 0.06813s