pytorch-datastream


Namepytorch-datastream JSON
Version 0.4.10 PyPI version JSON
download
home_pagehttps://github.com/nextml-code/pytorch-datastream
SummarySimple dataset to dataloader library for pytorch
upload_time2023-11-28 15:12:43
maintainer
docs_urlNone
authorNextML
requires_python>=3.8,<4.0
licenseApache-2.0
keywords pytorch machine learning dataset pipeline dataloader
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ==================
Pytorch Datastream
==================

.. image:: https://badge.fury.io/py/pytorch-datastream.svg
       :target: https://badge.fury.io/py/pytorch-datastream

.. image:: https://img.shields.io/pypi/pyversions/pytorch-datastream.svg
       :target: https://pypi.python.org/pypi/pytorch-datastream

.. image:: https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest
       :target: https://pytorch-datastream.readthedocs.io/en/latest/?badge=latest

.. image:: https://img.shields.io/pypi/l/pytorch-datastream.svg
       :target: https://pypi.python.org/pypi/pytorch-datastream



This is a simple library for creating readable dataset pipelines and
reusing best practices for issues such as imbalanced datasets. There are
just two components to keep track of: ``Dataset`` and ``Datastream``.

``Dataset`` is a simple mapping between an index and an example. It provides 
pipelining of functions in a readable syntax originally adapted from
tensorflow 2's ``tf.data.Dataset``.

``Datastream`` combines a ``Dataset`` and a sampler into a stream of examples.
It provides a simple solution to oversampling / stratification, weighted
sampling, and finally converting to a ``torch.utils.data.DataLoader``.

Install
=======

.. code-block::

    poetry add pytorch-datastream

Or, for the old-timers:

.. code-block::

    pip install pytorch-datastream

Usage
=====

The list below is meant to showcase functions that are useful in most standard
and non-standard cases. It is not meant to be an exhaustive list. See the 
`documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_ for 
a more extensive list on API and usage.

.. code-block:: python

    Dataset.from_subscriptable
    Dataset.from_dataframe
    Dataset
        .map
        .subset
        .split
        .cache
        .with_columns

    Datastream.merge
    Datastream.zip
    Datastream
        .map
        .data_loader
        .zip_index
        .update_weights_
        .update_example_weight_
        .weight
        .state_dict
        .load_state_dict

Merge / stratify / oversample datastreams
-----------------------------------------
The fruit datastreams given below repeatedly yields the string of its fruit
type.

.. code-block:: python

    >>> datastream = Datastream.merge([
    ...     (apple_datastream, 2),
    ...     (pear_datastream, 1),
    ...     (banana_datastream, 1),
    ... ])
    >>> next(iter(datastream.data_loader(batch_size=8)))
    ['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']

Zip independently sampled datastreams
-------------------------------------
The fruit datastreams given below repeatedly yields the string of its fruit
type.

.. code-block:: python

    >>> datastream = Datastream.zip([
    ...     apple_datastream,
    ...     Datastream.merge([pear_datastream, banana_datastream]),
    ... ])
    >>> next(iter(datastream.data_loader(batch_size=4)))
    [('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]

More usage examples
-------------------
See the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_
for more usage examples.

Install from source
===================

.. pip install -e .

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nextml-code/pytorch-datastream",
    "name": "pytorch-datastream",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8,<4.0",
    "maintainer_email": "",
    "keywords": "pytorch,machine,learning,dataset,pipeline,dataloader",
    "author": "NextML",
    "author_email": "",
    "download_url": "https://files.pythonhosted.org/packages/ed/d3/7f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b/pytorch_datastream-0.4.10.tar.gz",
    "platform": null,
    "description": "==================\nPytorch Datastream\n==================\n\n.. image:: https://badge.fury.io/py/pytorch-datastream.svg\n       :target: https://badge.fury.io/py/pytorch-datastream\n\n.. image:: https://img.shields.io/pypi/pyversions/pytorch-datastream.svg\n       :target: https://pypi.python.org/pypi/pytorch-datastream\n\n.. image:: https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest\n       :target: https://pytorch-datastream.readthedocs.io/en/latest/?badge=latest\n\n.. image:: https://img.shields.io/pypi/l/pytorch-datastream.svg\n       :target: https://pypi.python.org/pypi/pytorch-datastream\n\n\n\nThis is a simple library for creating readable dataset pipelines and\nreusing best practices for issues such as imbalanced datasets. There are\njust two components to keep track of: ``Dataset`` and ``Datastream``.\n\n``Dataset`` is a simple mapping between an index and an example. It provides \npipelining of functions in a readable syntax originally adapted from\ntensorflow 2's ``tf.data.Dataset``.\n\n``Datastream`` combines a ``Dataset`` and a sampler into a stream of examples.\nIt provides a simple solution to oversampling / stratification, weighted\nsampling, and finally converting to a ``torch.utils.data.DataLoader``.\n\nInstall\n=======\n\n.. code-block::\n\n    poetry add pytorch-datastream\n\nOr, for the old-timers:\n\n.. code-block::\n\n    pip install pytorch-datastream\n\nUsage\n=====\n\nThe list below is meant to showcase functions that are useful in most standard\nand non-standard cases. It is not meant to be an exhaustive list. See the \n`documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_ for \na more extensive list on API and usage.\n\n.. code-block:: python\n\n    Dataset.from_subscriptable\n    Dataset.from_dataframe\n    Dataset\n        .map\n        .subset\n        .split\n        .cache\n        .with_columns\n\n    Datastream.merge\n    Datastream.zip\n    Datastream\n        .map\n        .data_loader\n        .zip_index\n        .update_weights_\n        .update_example_weight_\n        .weight\n        .state_dict\n        .load_state_dict\n\nMerge / stratify / oversample datastreams\n-----------------------------------------\nThe fruit datastreams given below repeatedly yields the string of its fruit\ntype.\n\n.. code-block:: python\n\n    >>> datastream = Datastream.merge([\n    ...     (apple_datastream, 2),\n    ...     (pear_datastream, 1),\n    ...     (banana_datastream, 1),\n    ... ])\n    >>> next(iter(datastream.data_loader(batch_size=8)))\n    ['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']\n\nZip independently sampled datastreams\n-------------------------------------\nThe fruit datastreams given below repeatedly yields the string of its fruit\ntype.\n\n.. code-block:: python\n\n    >>> datastream = Datastream.zip([\n    ...     apple_datastream,\n    ...     Datastream.merge([pear_datastream, banana_datastream]),\n    ... ])\n    >>> next(iter(datastream.data_loader(batch_size=4)))\n    [('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]\n\nMore usage examples\n-------------------\nSee the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_\nfor more usage examples.\n\nInstall from source\n===================\n\n.. pip install -e .\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Simple dataset to dataloader library for pytorch",
    "version": "0.4.10",
    "project_urls": {
        "Documentation": "https://pytorch-datastream.readthedocs.io",
        "Homepage": "https://github.com/nextml-code/pytorch-datastream",
        "Repository": "https://github.com/nextml-code/pytorch-datastream"
    },
    "split_keywords": [
        "pytorch",
        "machine",
        "learning",
        "dataset",
        "pipeline",
        "dataloader"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dfd55067dd96e96aea49737f7b7cd411645d76bb2327fb495ffb89c82de431fe",
                "md5": "e67bf36fbe4f82e3e9046fdf97fa6429",
                "sha256": "89afca7f52fe24351caf69bcba44c2e5c2a1bfba05e0421da27d80006ef4283a"
            },
            "downloads": -1,
            "filename": "pytorch_datastream-0.4.10-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e67bf36fbe4f82e3e9046fdf97fa6429",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8,<4.0",
            "size": 28896,
            "upload_time": "2023-11-28T15:12:42",
            "upload_time_iso_8601": "2023-11-28T15:12:42.030437Z",
            "url": "https://files.pythonhosted.org/packages/df/d5/5067dd96e96aea49737f7b7cd411645d76bb2327fb495ffb89c82de431fe/pytorch_datastream-0.4.10-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "edd37f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b",
                "md5": "373abf64ad2c0e53b7f31fd8531a49d5",
                "sha256": "2943cf82091090d1d459cf4e7f3ac7a242f038efb95709dc0b4e08f5c7896bfd"
            },
            "downloads": -1,
            "filename": "pytorch_datastream-0.4.10.tar.gz",
            "has_sig": false,
            "md5_digest": "373abf64ad2c0e53b7f31fd8531a49d5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8,<4.0",
            "size": 23493,
            "upload_time": "2023-11-28T15:12:43",
            "upload_time_iso_8601": "2023-11-28T15:12:43.257256Z",
            "url": "https://files.pythonhosted.org/packages/ed/d3/7f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b/pytorch_datastream-0.4.10.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-11-28 15:12:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nextml-code",
    "github_project": "pytorch-datastream",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pytorch-datastream"
}
        
Elapsed time: 0.16867s