PyStore


NamePyStore JSON
Version 0.1.24 PyPI version JSON
download
home_pagehttps://github.com/ranaroussi/pystore
SummaryFast data store for Pandas timeseries data
upload_time2024-07-10 10:16:13
maintainerNone
docs_urlNone
authorRan Aroussi
requires_pythonNone
licenseApache Software License
keywords dask datastore flatfile pystore
VCS
bugtrack_url
requirements python-snappy toolz partd cloudpickle distributed pandas numpy pyarrow dask multitasking
Travis-CI
coveralls test coverage No coveralls.
            PyStore - Fast data store for Pandas timeseries data
====================================================

.. image:: https://img.shields.io/badge/python-2.7,%203.5+-blue.svg?style=flat
    :target: https://pypi.python.org/pypi/pystore
    :alt: Python version

.. image:: https://img.shields.io/pypi/v/pystore.svg?maxAge=60
    :target: https://pypi.python.org/pypi/pystore
    :alt: PyPi version

.. image:: https://img.shields.io/pypi/status/pystore.svg?maxAge=60
    :target: https://pypi.python.org/pypi/pystore
    :alt: PyPi status

.. image:: https://img.shields.io/travis/ranaroussi/pystore/master.svg?maxAge=1
    :target: https://travis-ci.com/ranaroussi/pystore
    :alt: Travis-CI build status

.. image:: https://www.codefactor.io/repository/github/ranaroussi/pystore/badge
    :target: https://www.codefactor.io/repository/github/ranaroussi/pystore
    :alt: CodeFactor

.. image:: https://img.shields.io/github/stars/ranaroussi/pystore.svg?style=social&label=Star&maxAge=60
    :target: https://github.com/ranaroussi/pystore
    :alt: Star this repo

.. image:: https://img.shields.io/twitter/follow/aroussi.svg?style=social&label=Follow&maxAge=60
    :target: https://twitter.com/aroussi
    :alt: Follow me on twitter

\


`PyStore <https://github.com/ranaroussi/pystore>`_ is a simple (yet powerful)
datastore for Pandas dataframes, and while it can store any Pandas object,
**it was designed with storing timeseries data in mind**.

It's built on top of `Pandas <http://pandas.pydata.org>`_, `Numpy <http://numpy.pydata.org>`_,
`Dask <http://dask.pydata.org>`_, and `Parquet <http://parquet.apache.org>`_
(via `pyarrow <https://github.com/apache/arrow>`_),
to provide an easy to use datastore for Python developers that can easily
query millions of rows per second per client.


==> Check out `this Blog post <https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2>`_
for the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.

==> Follow `this PyStore tutorial <https://github.com/ranaroussi/pystore/blob/master/examples/pystore-tutorial.ipynb>`_ in Jupyter notebook format.


Quickstart
==========

Install PyStore
---------------

Install using `pip`:

.. code:: bash

    $ pip install pystore --upgrade --no-cache-dir

Install using `conda`:

.. code:: bash

    $ conda install -c ranaroussi pystore

**INSTALLATION NOTE:**
If you don't have Snappy installed (compression/decompression library), you'll need to
you'll need to `install it first <https://github.com/ranaroussi/pystore#dependencies>`_.


Using PyStore
-------------

.. code:: python

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-

    import pystore
    import quandl

    # Set storage path (optional)
    # Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)
    pystore.set_path("~/pystore")

    # List stores
    pystore.list_stores()

    # Connect to datastore (create it if not exist)
    store = pystore.store('mydatastore')

    # List existing collections
    store.list_collections()

    # Access a collection (create it if not exist)
    collection = store.collection('NASDAQ')

    # List items in collection
    collection.list_items()

    # Load some data from Quandl
    aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

    # Store the first 100 rows of the data in the collection under "AAPL"
    collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})

    # Reading the item's data
    item = collection.item('AAPL')
    data = item.data  # <-- Dask dataframe (see dask.pydata.org)
    metadata = item.metadata
    df = item.to_pandas()

    # Append the rest of the rows to the "AAPL" item
    collection.append('AAPL', aapl[100:])

    # Reading the item's data
    item = collection.item('AAPL')
    data = item.data
    metadata = item.metadata
    df = item.to_pandas()


    # --- Query functionality ---

    # Query avaialable symbols based on metadata
    collection.list_items(some_key='some_value', other_key='other_value')


    # --- Snapshot functionality ---

    # Snapshot a collection
    # (Point-in-time named reference for all current symbols in a collection)
    collection.create_snapshot('snapshot_name')

    # List available snapshots
    collection.list_snapshots()

    # Get a version of a symbol given a snapshot name
    collection.item('AAPL', snapshot='snapshot_name')

    # Delete a collection snapshot
    collection.delete_snapshot('snapshot_name')


    # ...


    # Delete the item from the current version
    collection.delete_item('AAPL')

    # Delete the collection
    store.delete_collection('NASDAQ')


Using Dask schedulers
---------------------

PyStore 0.1.18+ supports using Dask distributed.

To use a local Dask scheduler, add this to your code:

.. code:: python

    from dask.distributed import LocalCluster
    pystore.set_client(LocalCluster())


To use a distributed Dask scheduler, add this to your code:

.. code:: python

    pystore.set_client("tcp://xxx.xxx.xxx.xxx:xxxx")
    pystore.set_path("/path/to/shared/volume/all/workers/can/access")



Concepts
========

PyStore provides namespaced *collections* of data.
These collections allow bucketing data by *source*, *user* or some other metric
(for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace)
maps to a directory containing partitioned **parquet files** for each item (e.g. symbol).

A good practice it to create collections that may look something like this:

* collection.EOD
* collection.ONEMINUTE

Requirements
============

* Python 2.7 or Python > 3.5
* Pandas
* Numpy
* Dask
* Pyarrow
* `Snappy <http://google.github.io/snappy/>`_ (Google's compression/decompression library)
* multitasking

PyStore was tested to work on \*nix-like systems, including macOS.


Dependencies:
-------------

PyStore uses `Snappy <http://google.github.io/snappy/>`_,
a fast and efficient compression/decompression library from Google.
You'll need to install Snappy on your system before installing PyStore.

\* See the ``python-snappy`` `Github repo <https://github.com/andrix/python-snappy#dependencies>`_ for more information.

***nix Systems:**

- APT: ``sudo apt-get install libsnappy-dev``
- RPM: ``sudo yum install libsnappy-devel``

**macOS:**

First, install Snappy's C library using `Homebrew <https://brew.sh>`_:

.. code::

    $ brew install snappy

Then, install Python's snappy using conda:

.. code::

    $ conda install python-snappy -c conda-forge

...or, using `pip`:

.. code::

    $ CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install python-snappy


**Windows:**

Windows users should checkout `Snappy for Windows <https://snappy.machinezoo.com>`_ and `this Stackoverflow post <https://stackoverflow.com/a/43756412/1783569>`_ for help on installing Snappy and ``python-snappy``.


Roadmap
=======

PyStore currently offers support for local filesystem (including attached network drives).
I plan on adding support for Amazon S3 (via `s3fs <http://s3fs.readthedocs.io/>`_),
Google Cloud Storage (via `gcsfs <https://github.com/dask/gcsfs/>`_)
and Hadoop Distributed File System (via `hdfs3 <http://hdfs3.readthedocs.io/>`_) in the future.

Acknowledgements
================

PyStore is hugely inspired by `Man AHL <http://www.ahl.com/>`_'s
`Arctic <https://github.com/manahl/arctic>`_ which uses
MongoDB for storage and allow for versioning and other features.
I highly reommend you check it out.



License
=======


PyStore is licensed under the **Apache License, Version 2.0**. A copy of which is included in LICENSE.txt.

-----

I'm very interested in your experience with PyStore.
Please drop me an note with any feedback you have.

Contributions welcome!

\- **Ran Aroussi**

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ranaroussi/pystore",
    "name": "PyStore",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "dask, datastore, flatfile, pystore",
    "author": "Ran Aroussi",
    "author_email": "ran@aroussi.com",
    "download_url": "https://files.pythonhosted.org/packages/8b/c6/934106fb2a1c45c375e91e7468b9c79755de9f2a0b836ad002295a38df24/pystore-0.1.24.tar.gz",
    "platform": "linux",
    "description": "PyStore - Fast data store for Pandas timeseries data\n====================================================\n\n.. image:: https://img.shields.io/badge/python-2.7,%203.5+-blue.svg?style=flat\n    :target: https://pypi.python.org/pypi/pystore\n    :alt: Python version\n\n.. image:: https://img.shields.io/pypi/v/pystore.svg?maxAge=60\n    :target: https://pypi.python.org/pypi/pystore\n    :alt: PyPi version\n\n.. image:: https://img.shields.io/pypi/status/pystore.svg?maxAge=60\n    :target: https://pypi.python.org/pypi/pystore\n    :alt: PyPi status\n\n.. image:: https://img.shields.io/travis/ranaroussi/pystore/master.svg?maxAge=1\n    :target: https://travis-ci.com/ranaroussi/pystore\n    :alt: Travis-CI build status\n\n.. image:: https://www.codefactor.io/repository/github/ranaroussi/pystore/badge\n    :target: https://www.codefactor.io/repository/github/ranaroussi/pystore\n    :alt: CodeFactor\n\n.. image:: https://img.shields.io/github/stars/ranaroussi/pystore.svg?style=social&label=Star&maxAge=60\n    :target: https://github.com/ranaroussi/pystore\n    :alt: Star this repo\n\n.. image:: https://img.shields.io/twitter/follow/aroussi.svg?style=social&label=Follow&maxAge=60\n    :target: https://twitter.com/aroussi\n    :alt: Follow me on twitter\n\n\\\n\n\n`PyStore <https://github.com/ranaroussi/pystore>`_ is a simple (yet powerful)\ndatastore for Pandas dataframes, and while it can store any Pandas object,\n**it was designed with storing timeseries data in mind**.\n\nIt's built on top of `Pandas <http://pandas.pydata.org>`_, `Numpy <http://numpy.pydata.org>`_,\n`Dask <http://dask.pydata.org>`_, and `Parquet <http://parquet.apache.org>`_\n(via `pyarrow <https://github.com/apache/arrow>`_),\nto provide an easy to use datastore for Python developers that can easily\nquery millions of rows per second per client.\n\n\n==> Check out `this Blog post <https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2>`_\nfor the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.\n\n==> Follow `this PyStore tutorial <https://github.com/ranaroussi/pystore/blob/master/examples/pystore-tutorial.ipynb>`_ in Jupyter notebook format.\n\n\nQuickstart\n==========\n\nInstall PyStore\n---------------\n\nInstall using `pip`:\n\n.. code:: bash\n\n    $ pip install pystore --upgrade --no-cache-dir\n\nInstall using `conda`:\n\n.. code:: bash\n\n    $ conda install -c ranaroussi pystore\n\n**INSTALLATION NOTE:**\nIf you don't have Snappy installed (compression/decompression library), you'll need to\nyou'll need to `install it first <https://github.com/ranaroussi/pystore#dependencies>`_.\n\n\nUsing PyStore\n-------------\n\n.. code:: python\n\n    #!/usr/bin/env python\n    # -*- coding: utf-8 -*-\n\n    import pystore\n    import quandl\n\n    # Set storage path (optional)\n    # Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)\n    pystore.set_path(\"~/pystore\")\n\n    # List stores\n    pystore.list_stores()\n\n    # Connect to datastore (create it if not exist)\n    store = pystore.store('mydatastore')\n\n    # List existing collections\n    store.list_collections()\n\n    # Access a collection (create it if not exist)\n    collection = store.collection('NASDAQ')\n\n    # List items in collection\n    collection.list_items()\n\n    # Load some data from Quandl\n    aapl = quandl.get(\"WIKI/AAPL\", authtoken=\"your token here\")\n\n    # Store the first 100 rows of the data in the collection under \"AAPL\"\n    collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})\n\n    # Reading the item's data\n    item = collection.item('AAPL')\n    data = item.data  # <-- Dask dataframe (see dask.pydata.org)\n    metadata = item.metadata\n    df = item.to_pandas()\n\n    # Append the rest of the rows to the \"AAPL\" item\n    collection.append('AAPL', aapl[100:])\n\n    # Reading the item's data\n    item = collection.item('AAPL')\n    data = item.data\n    metadata = item.metadata\n    df = item.to_pandas()\n\n\n    # --- Query functionality ---\n\n    # Query avaialable symbols based on metadata\n    collection.list_items(some_key='some_value', other_key='other_value')\n\n\n    # --- Snapshot functionality ---\n\n    # Snapshot a collection\n    # (Point-in-time named reference for all current symbols in a collection)\n    collection.create_snapshot('snapshot_name')\n\n    # List available snapshots\n    collection.list_snapshots()\n\n    # Get a version of a symbol given a snapshot name\n    collection.item('AAPL', snapshot='snapshot_name')\n\n    # Delete a collection snapshot\n    collection.delete_snapshot('snapshot_name')\n\n\n    # ...\n\n\n    # Delete the item from the current version\n    collection.delete_item('AAPL')\n\n    # Delete the collection\n    store.delete_collection('NASDAQ')\n\n\nUsing Dask schedulers\n---------------------\n\nPyStore 0.1.18+ supports using Dask distributed.\n\nTo use a local Dask scheduler, add this to your code:\n\n.. code:: python\n\n    from dask.distributed import LocalCluster\n    pystore.set_client(LocalCluster())\n\n\nTo use a distributed Dask scheduler, add this to your code:\n\n.. code:: python\n\n    pystore.set_client(\"tcp://xxx.xxx.xxx.xxx:xxxx\")\n    pystore.set_path(\"/path/to/shared/volume/all/workers/can/access\")\n\n\n\nConcepts\n========\n\nPyStore provides namespaced *collections* of data.\nThese collections allow bucketing data by *source*, *user* or some other metric\n(for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace)\nmaps to a directory containing partitioned **parquet files** for each item (e.g. symbol).\n\nA good practice it to create collections that may look something like this:\n\n* collection.EOD\n* collection.ONEMINUTE\n\nRequirements\n============\n\n* Python 2.7 or Python > 3.5\n* Pandas\n* Numpy\n* Dask\n* Pyarrow\n* `Snappy <http://google.github.io/snappy/>`_ (Google's compression/decompression library)\n* multitasking\n\nPyStore was tested to work on \\*nix-like systems, including macOS.\n\n\nDependencies:\n-------------\n\nPyStore uses `Snappy <http://google.github.io/snappy/>`_,\na fast and efficient compression/decompression library from Google.\nYou'll need to install Snappy on your system before installing PyStore.\n\n\\* See the ``python-snappy`` `Github repo <https://github.com/andrix/python-snappy#dependencies>`_ for more information.\n\n***nix Systems:**\n\n- APT: ``sudo apt-get install libsnappy-dev``\n- RPM: ``sudo yum install libsnappy-devel``\n\n**macOS:**\n\nFirst, install Snappy's C library using `Homebrew <https://brew.sh>`_:\n\n.. code::\n\n    $ brew install snappy\n\nThen, install Python's snappy using conda:\n\n.. code::\n\n    $ conda install python-snappy -c conda-forge\n\n...or, using `pip`:\n\n.. code::\n\n    $ CPPFLAGS=\"-I/usr/local/include -L/usr/local/lib\" pip install python-snappy\n\n\n**Windows:**\n\nWindows users should checkout `Snappy for Windows <https://snappy.machinezoo.com>`_ and `this Stackoverflow post <https://stackoverflow.com/a/43756412/1783569>`_ for help on installing Snappy and ``python-snappy``.\n\n\nRoadmap\n=======\n\nPyStore currently offers support for local filesystem (including attached network drives).\nI plan on adding support for Amazon S3 (via `s3fs <http://s3fs.readthedocs.io/>`_),\nGoogle Cloud Storage (via `gcsfs <https://github.com/dask/gcsfs/>`_)\nand Hadoop Distributed File System (via `hdfs3 <http://hdfs3.readthedocs.io/>`_) in the future.\n\nAcknowledgements\n================\n\nPyStore is hugely inspired by `Man AHL <http://www.ahl.com/>`_'s\n`Arctic <https://github.com/manahl/arctic>`_ which uses\nMongoDB for storage and allow for versioning and other features.\nI highly reommend you check it out.\n\n\n\nLicense\n=======\n\n\nPyStore is licensed under the **Apache License, Version 2.0**. A copy of which is included in LICENSE.txt.\n\n-----\n\nI'm very interested in your experience with PyStore.\nPlease drop me an note with any feedback you have.\n\nContributions welcome!\n\n\\- **Ran Aroussi**\n",
    "bugtrack_url": null,
    "license": "Apache Software License",
    "summary": "Fast data store for Pandas timeseries data",
    "version": "0.1.24",
    "project_urls": {
        "Homepage": "https://github.com/ranaroussi/pystore"
    },
    "split_keywords": [
        "dask",
        " datastore",
        " flatfile",
        " pystore"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d3acbef022c9c9c8de9306e35d171bc70542fcced5905cca798ec5b6c60b5c71",
                "md5": "cf844c75b6241cd28c20812498faecab",
                "sha256": "c6418769e4456646ed928939774970bb0d41d4a1f118a88df75abc831e39a443"
            },
            "downloads": -1,
            "filename": "PyStore-0.1.24-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cf844c75b6241cd28c20812498faecab",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 17167,
            "upload_time": "2024-07-10T10:16:11",
            "upload_time_iso_8601": "2024-07-10T10:16:11.961608Z",
            "url": "https://files.pythonhosted.org/packages/d3/ac/bef022c9c9c8de9306e35d171bc70542fcced5905cca798ec5b6c60b5c71/PyStore-0.1.24-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8bc6934106fb2a1c45c375e91e7468b9c79755de9f2a0b836ad002295a38df24",
                "md5": "dfa4e9d26c761532e032a7d62c2eed6b",
                "sha256": "abe99682d2da83a6619077c34f9ca7117680d5e98923f35a6e0457fcfe8ac43b"
            },
            "downloads": -1,
            "filename": "pystore-0.1.24.tar.gz",
            "has_sig": false,
            "md5_digest": "dfa4e9d26c761532e032a7d62c2eed6b",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13659,
            "upload_time": "2024-07-10T10:16:13",
            "upload_time_iso_8601": "2024-07-10T10:16:13.800390Z",
            "url": "https://files.pythonhosted.org/packages/8b/c6/934106fb2a1c45c375e91e7468b9c79755de9f2a0b836ad002295a38df24/pystore-0.1.24.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-10 10:16:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ranaroussi",
    "github_project": "pystore",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "python-snappy",
            "specs": [
                [
                    ">=",
                    "0.5.4"
                ]
            ]
        },
        {
            "name": "toolz",
            "specs": [
                [
                    ">=",
                    "0.10.0"
                ]
            ]
        },
        {
            "name": "partd",
            "specs": [
                [
                    ">=",
                    "1.0.0"
                ]
            ]
        },
        {
            "name": "cloudpickle",
            "specs": [
                [
                    ">=",
                    "1.2.1"
                ]
            ]
        },
        {
            "name": "distributed",
            "specs": [
                [
                    ">=",
                    "1.28.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "0.24.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.17.3"
                ]
            ]
        },
        {
            "name": "pyarrow",
            "specs": [
                [
                    ">=",
                    "15.0.2"
                ]
            ]
        },
        {
            "name": "dask",
            "specs": [
                [
                    ">=",
                    "2024.1.0"
                ]
            ]
        },
        {
            "name": "multitasking",
            "specs": [
                [
                    ">=",
                    "0.0.9"
                ]
            ]
        }
    ],
    "lcname": "pystore"
}
        
Elapsed time: 8.84606s