zipline-polygon-bundle


Namezipline-polygon-bundle JSON
Version 0.1.5 PyPI version JSON
download
home_pageNone
SummaryA zipline-reloaded data provider bundle for Polygon.io
upload_time2024-10-04 18:38:25
maintainerNone
docs_urlNone
authorJim White
requires_python<4.0,>=3.9
licenseAGPL-3.0
keywords zipline data-bundle finance
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # zipline-polygon-bundle
`zipline-polygon-bundle` is a `zipline-reloaded` (https://github.com/stefan-jansen/zipline-reloaded) data ingestion bundle for [Polygon.io](https://polygon.io/).

## GitHub
https://github.com/fovi-llc/zipline-polygon-bundle

## Resources

Get a subscription to https://polygon.io/ for an API key and access to flat files.

https://polygon.io/knowledge-base/article/how-to-get-started-with-s3

Quantopian's Zipline backtester revived by Stefan Jansen: https://github.com/stefan-jansen/zipline-reloaded

Stefan's excellent book *Machine Learning for Algorithmic Trading*: https://ml4trading.io/

*Trading Evolved* by Andreas Clenow is a gentler introduction to Zipline Reloaded: https://www.followingthetrend.com/trading-evolved/

Code from *Trading Evolved* with some small updates for convenience: https://github.com/fovi-llc/trading_evolved

One of the modifications I've made to that code is so that some of the notebooks can be run on Colab with a minimum of fuss: https://github.com/fovi-llc/trading_evolved/blob/main/Chapter%207%20-%20Backtesting%20Trading%20Strategies/First%20Zipline%20Backtest.ipynb

# Ingest data from Polygon.io into Zipline

## Set up your `rclone` (https://rclone.org/) configuration
```bash
export POLYGON_FILE_ENDPOINT=https://files.polygon.io/
rclone config create s3polygon s3 env_auth=false endpoint=$POLYGON_FILE_ENDPOINT \
  access_key_id=$POLYGON_S3_Access_ID secret_access_key=$POLYGON_Secret_Access_Key
```

## Get flat files (`*.csv.gz`) for US Stock daily aggregates.
The default asset dir is `us_stock_sip` but that can be overriden with the `POLYGON_ASSET_SUBDIR` 
environment variable if/when Polygon.io adds other markets to flat files.

```bash
export POLYGON_DATA_DIR=`pwd`/data/files.polygon.io
for year in 2024 2023 2022 2021; do \
    rclone copy -P s3polygon:flatfiles/us_stocks_sip/day_aggs_v1/$year \
    $POLYGON_DATA_DIR/flatfiles/us_stocks_sip/day_aggs_v1/$year; \
done
```

## `extension.py`

```python
from zipline_polygon_bundle import register_polygon_equities_bundle

# All tickers (>20K) are ingested.  Filtering is TBD.
# `start_session` and `end_session` can be set to ingest a range of dates (which must be market days).
register_polygon_equities_bundle(
    "polygon",
    calendar_name="XNYS",
    agg_time="day"
)
```

## Install the Zipline Polygon.io Bundle PyPi package and check that it works.
Listing bundles will show if everything is working correctly.
```bash
pip install zipline_polygon_bundle
zipline -e extension.py bundles
```
stdout:
```
csvdir <no ingestions>
polygon <no ingestions>
polygon-minute <no ingestions>
quandl <no ingestions>
quantopian-quandl <no ingestions>
```

## Ingest the Polygon.io data.  The API key is needed for the split and dividend data.

Note that ingest currently stores cached API data and shuffled agg data in the `POLYGON_DATA_DIR` directory (`flatfiles/us_stocks_sip/api_cache` and `flatfiles/us_stocks_sip/day_by_ticker_v1` respectively) so write access is needed at this stage.  After ingestion the data in `POLYGON_DATA_DIR` is not accessed.

```bash
export POLYGON_API_KEY=<your API key here>
zipline -e extension.py ingest -b polygon
```

### Cleaning up bad ingests
After a while you may wind up with old (or empty because of an error during ingestion) bundles cluttering
up the list and could waste space (although old bundles may be useful for rerunning old backtests).
To remove all but the last ingest (say after your first successful ingest after a number of false starts) you could use:
```bash
zipline -e extension.py clean -b polygon --keep-last 1
```

## Using minute aggregate flat files.
Minute aggs work too but everything takes more space and a lot longer to do.  

```bash
export POLYGON_DATA_DIR=`pwd`/data/files.polygon.io
for year in 2024 2023 2022 2021; do \
    rclone copy -P s3polygon:flatfiles/us_stocks_sip/minute_aggs_v1/$year \
    $POLYGON_DATA_DIR/flatfiles/us_stocks_sip/minute_aggs_v1/$year; \
done
```

If you set the `ZIPLINE_ROOT` environment variable (recommended and likely necessary because the default of `~/.zipline` is probably not what you'll want) and copy your `extension.py` config there then you don't need to put `-e extension.py` on the `zipline` command line.

This ingestion for 10 years of minute bars took around 10 hours on my Mac using an external hard drive (not SSD).  A big chunk of that was copying from the default tmp dir to the Zipline root (6.3million files for 47GB actual, 63GB used).  I plan to change that `shutil.copy2` to use `shutil.move` and to use a `tmp` dir in Zipline root for temporary files instead of the default which should save an hour or two.  Also the ingestion process is single threaded and could be sped up with some concurrency.

```bash
zipline ingest -b polygon-minute
```

# License is Affero General Public License v3 (AGPL v3)
The content of this project is Copyright (C) 2024 Fovi LLC and authored by James P. White (https://www.linkedin.com/in/jamespaulwhite/).  It is distributed under the terms of the GNU AFFERO GENERAL PUBLIC LICENSE (AGPL) Version 3 (See LICENSE file).

The AGPL doesn't put any restrictions on personal use but people using this in a service for others have obligations.  If you have commerical purposes and those distribution requirements don't work for you, feel free to contact me (mailto:jim@fovi.com) about other licensing terms.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "zipline-polygon-bundle",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "zipline, data-bundle, finance",
    "author": "Jim White",
    "author_email": "jim@fovi.com",
    "download_url": "https://files.pythonhosted.org/packages/d6/c5/f701f640d658a7c8e633ac637e32d48847c7460c08df36acf3c6c7a5ea50/zipline_polygon_bundle-0.1.5.tar.gz",
    "platform": null,
    "description": "# zipline-polygon-bundle\n`zipline-polygon-bundle` is a `zipline-reloaded` (https://github.com/stefan-jansen/zipline-reloaded) data ingestion bundle for [Polygon.io](https://polygon.io/).\n\n## GitHub\nhttps://github.com/fovi-llc/zipline-polygon-bundle\n\n## Resources\n\nGet a subscription to https://polygon.io/ for an API key and access to flat files.\n\nhttps://polygon.io/knowledge-base/article/how-to-get-started-with-s3\n\nQuantopian's Zipline backtester revived by Stefan Jansen: https://github.com/stefan-jansen/zipline-reloaded\n\nStefan's excellent book *Machine Learning for Algorithmic Trading*: https://ml4trading.io/\n\n*Trading Evolved* by Andreas Clenow is a gentler introduction to Zipline Reloaded: https://www.followingthetrend.com/trading-evolved/\n\nCode from *Trading Evolved* with some small updates for convenience: https://github.com/fovi-llc/trading_evolved\n\nOne of the modifications I've made to that code is so that some of the notebooks can be run on Colab with a minimum of fuss: https://github.com/fovi-llc/trading_evolved/blob/main/Chapter%207%20-%20Backtesting%20Trading%20Strategies/First%20Zipline%20Backtest.ipynb\n\n# Ingest data from Polygon.io into Zipline\n\n## Set up your `rclone` (https://rclone.org/) configuration\n```bash\nexport POLYGON_FILE_ENDPOINT=https://files.polygon.io/\nrclone config create s3polygon s3 env_auth=false endpoint=$POLYGON_FILE_ENDPOINT \\\n  access_key_id=$POLYGON_S3_Access_ID secret_access_key=$POLYGON_Secret_Access_Key\n```\n\n## Get flat files (`*.csv.gz`) for US Stock daily aggregates.\nThe default asset dir is `us_stock_sip` but that can be overriden with the `POLYGON_ASSET_SUBDIR` \nenvironment variable if/when Polygon.io adds other markets to flat files.\n\n```bash\nexport POLYGON_DATA_DIR=`pwd`/data/files.polygon.io\nfor year in 2024 2023 2022 2021; do \\\n    rclone copy -P s3polygon:flatfiles/us_stocks_sip/day_aggs_v1/$year \\\n    $POLYGON_DATA_DIR/flatfiles/us_stocks_sip/day_aggs_v1/$year; \\\ndone\n```\n\n## `extension.py`\n\n```python\nfrom zipline_polygon_bundle import register_polygon_equities_bundle\n\n# All tickers (>20K) are ingested.  Filtering is TBD.\n# `start_session` and `end_session` can be set to ingest a range of dates (which must be market days).\nregister_polygon_equities_bundle(\n    \"polygon\",\n    calendar_name=\"XNYS\",\n    agg_time=\"day\"\n)\n```\n\n## Install the Zipline Polygon.io Bundle PyPi package and check that it works.\nListing bundles will show if everything is working correctly.\n```bash\npip install zipline_polygon_bundle\nzipline -e extension.py bundles\n```\nstdout:\n```\ncsvdir <no ingestions>\npolygon <no ingestions>\npolygon-minute <no ingestions>\nquandl <no ingestions>\nquantopian-quandl <no ingestions>\n```\n\n## Ingest the Polygon.io data.  The API key is needed for the split and dividend data.\n\nNote that ingest currently stores cached API data and shuffled agg data in the `POLYGON_DATA_DIR` directory (`flatfiles/us_stocks_sip/api_cache` and `flatfiles/us_stocks_sip/day_by_ticker_v1` respectively) so write access is needed at this stage.  After ingestion the data in `POLYGON_DATA_DIR` is not accessed.\n\n```bash\nexport POLYGON_API_KEY=<your API key here>\nzipline -e extension.py ingest -b polygon\n```\n\n### Cleaning up bad ingests\nAfter a while you may wind up with old (or empty because of an error during ingestion) bundles cluttering\nup the list and could waste space (although old bundles may be useful for rerunning old backtests).\nTo remove all but the last ingest (say after your first successful ingest after a number of false starts) you could use:\n```bash\nzipline -e extension.py clean -b polygon --keep-last 1\n```\n\n## Using minute aggregate flat files.\nMinute aggs work too but everything takes more space and a lot longer to do.  \n\n```bash\nexport POLYGON_DATA_DIR=`pwd`/data/files.polygon.io\nfor year in 2024 2023 2022 2021; do \\\n    rclone copy -P s3polygon:flatfiles/us_stocks_sip/minute_aggs_v1/$year \\\n    $POLYGON_DATA_DIR/flatfiles/us_stocks_sip/minute_aggs_v1/$year; \\\ndone\n```\n\nIf you set the `ZIPLINE_ROOT` environment variable (recommended and likely necessary because the default of `~/.zipline` is probably not what you'll want) and copy your `extension.py` config there then you don't need to put `-e extension.py` on the `zipline` command line.\n\nThis ingestion for 10 years of minute bars took around 10 hours on my Mac using an external hard drive (not SSD).  A big chunk of that was copying from the default tmp dir to the Zipline root (6.3million files for 47GB actual, 63GB used).  I plan to change that `shutil.copy2` to use `shutil.move` and to use a `tmp` dir in Zipline root for temporary files instead of the default which should save an hour or two.  Also the ingestion process is single threaded and could be sped up with some concurrency.\n\n```bash\nzipline ingest -b polygon-minute\n```\n\n# License is Affero General Public License v3 (AGPL v3)\nThe content of this project is Copyright (C) 2024 Fovi LLC and authored by James P. White (https://www.linkedin.com/in/jamespaulwhite/).  It is distributed under the terms of the GNU AFFERO GENERAL PUBLIC LICENSE (AGPL) Version 3 (See LICENSE file).\n\nThe AGPL doesn't put any restrictions on personal use but people using this in a service for others have obligations.  If you have commerical purposes and those distribution requirements don't work for you, feel free to contact me (mailto:jim@fovi.com) about other licensing terms.\n",
    "bugtrack_url": null,
    "license": "AGPL-3.0",
    "summary": "A zipline-reloaded data provider bundle for Polygon.io",
    "version": "0.1.5",
    "project_urls": null,
    "split_keywords": [
        "zipline",
        " data-bundle",
        " finance"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "78a4cfbef2d69f4c6fb9e1cbdbbb54599ac0a7e36e8adf8da4fbf5087a54a29c",
                "md5": "5f32b79b784af014fa2284f47c776efe",
                "sha256": "9f2964815f3d2d6804d16a54713a5335958bbe341a43bac0667a36826f3c9f61"
            },
            "downloads": -1,
            "filename": "zipline_polygon_bundle-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5f32b79b784af014fa2284f47c776efe",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 37826,
            "upload_time": "2024-10-04T18:38:23",
            "upload_time_iso_8601": "2024-10-04T18:38:23.809115Z",
            "url": "https://files.pythonhosted.org/packages/78/a4/cfbef2d69f4c6fb9e1cbdbbb54599ac0a7e36e8adf8da4fbf5087a54a29c/zipline_polygon_bundle-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d6c5f701f640d658a7c8e633ac637e32d48847c7460c08df36acf3c6c7a5ea50",
                "md5": "912dbdd82ddac4a65cbcccb90628942d",
                "sha256": "f948a1aac760a623ac5bbb16893ca1933ae4ae9b4a89f9bf32097bff23e39f12"
            },
            "downloads": -1,
            "filename": "zipline_polygon_bundle-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "912dbdd82ddac4a65cbcccb90628942d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 33559,
            "upload_time": "2024-10-04T18:38:25",
            "upload_time_iso_8601": "2024-10-04T18:38:25.587353Z",
            "url": "https://files.pythonhosted.org/packages/d6/c5/f701f640d658a7c8e633ac637e32d48847c7460c08df36acf3c6c7a5ea50/zipline_polygon_bundle-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-04 18:38:25",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "zipline-polygon-bundle"
}
        
Elapsed time: 1.01075s