==================
Pytorch Datastream
==================
.. image:: https://badge.fury.io/py/pytorch-datastream.svg
:target: https://badge.fury.io/py/pytorch-datastream
.. image:: https://img.shields.io/pypi/pyversions/pytorch-datastream.svg
:target: https://pypi.python.org/pypi/pytorch-datastream
.. image:: https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest
:target: https://pytorch-datastream.readthedocs.io/en/latest/?badge=latest
.. image:: https://img.shields.io/pypi/l/pytorch-datastream.svg
:target: https://pypi.python.org/pypi/pytorch-datastream
This is a simple library for creating readable dataset pipelines and
reusing best practices for issues such as imbalanced datasets. There are
just two components to keep track of: ``Dataset`` and ``Datastream``.
``Dataset`` is a simple mapping between an index and an example. It provides
pipelining of functions in a readable syntax originally adapted from
tensorflow 2's ``tf.data.Dataset``.
``Datastream`` combines a ``Dataset`` and a sampler into a stream of examples.
It provides a simple solution to oversampling / stratification, weighted
sampling, and finally converting to a ``torch.utils.data.DataLoader``.
Install
=======
.. code-block::
poetry add pytorch-datastream
Or, for the old-timers:
.. code-block::
pip install pytorch-datastream
Usage
=====
The list below is meant to showcase functions that are useful in most standard
and non-standard cases. It is not meant to be an exhaustive list. See the
`documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_ for
a more extensive list on API and usage.
.. code-block:: python
Dataset.from_subscriptable
Dataset.from_dataframe
Dataset
.map
.subset
.split
.cache
.with_columns
Datastream.merge
Datastream.zip
Datastream
.map
.data_loader
.zip_index
.update_weights_
.update_example_weight_
.weight
.state_dict
.load_state_dict
Merge / stratify / oversample datastreams
-----------------------------------------
The fruit datastreams given below repeatedly yields the string of its fruit
type.
.. code-block:: python
>>> datastream = Datastream.merge([
... (apple_datastream, 2),
... (pear_datastream, 1),
... (banana_datastream, 1),
... ])
>>> next(iter(datastream.data_loader(batch_size=8)))
['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']
Zip independently sampled datastreams
-------------------------------------
The fruit datastreams given below repeatedly yields the string of its fruit
type.
.. code-block:: python
>>> datastream = Datastream.zip([
... apple_datastream,
... Datastream.merge([pear_datastream, banana_datastream]),
... ])
>>> next(iter(datastream.data_loader(batch_size=4)))
[('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]
More usage examples
-------------------
See the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_
for more usage examples.
Install from source
===================
.. pip install -e .
Raw data
{
"_id": null,
"home_page": "https://github.com/nextml-code/pytorch-datastream",
"name": "pytorch-datastream",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8,<4.0",
"maintainer_email": "",
"keywords": "pytorch,machine,learning,dataset,pipeline,dataloader",
"author": "NextML",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/ed/d3/7f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b/pytorch_datastream-0.4.10.tar.gz",
"platform": null,
"description": "==================\nPytorch Datastream\n==================\n\n.. image:: https://badge.fury.io/py/pytorch-datastream.svg\n :target: https://badge.fury.io/py/pytorch-datastream\n\n.. image:: https://img.shields.io/pypi/pyversions/pytorch-datastream.svg\n :target: https://pypi.python.org/pypi/pytorch-datastream\n\n.. image:: https://readthedocs.org/projects/pytorch-datastream/badge/?version=latest\n :target: https://pytorch-datastream.readthedocs.io/en/latest/?badge=latest\n\n.. image:: https://img.shields.io/pypi/l/pytorch-datastream.svg\n :target: https://pypi.python.org/pypi/pytorch-datastream\n\n\n\nThis is a simple library for creating readable dataset pipelines and\nreusing best practices for issues such as imbalanced datasets. There are\njust two components to keep track of: ``Dataset`` and ``Datastream``.\n\n``Dataset`` is a simple mapping between an index and an example. It provides \npipelining of functions in a readable syntax originally adapted from\ntensorflow 2's ``tf.data.Dataset``.\n\n``Datastream`` combines a ``Dataset`` and a sampler into a stream of examples.\nIt provides a simple solution to oversampling / stratification, weighted\nsampling, and finally converting to a ``torch.utils.data.DataLoader``.\n\nInstall\n=======\n\n.. code-block::\n\n poetry add pytorch-datastream\n\nOr, for the old-timers:\n\n.. code-block::\n\n pip install pytorch-datastream\n\nUsage\n=====\n\nThe list below is meant to showcase functions that are useful in most standard\nand non-standard cases. It is not meant to be an exhaustive list. See the \n`documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_ for \na more extensive list on API and usage.\n\n.. code-block:: python\n\n Dataset.from_subscriptable\n Dataset.from_dataframe\n Dataset\n .map\n .subset\n .split\n .cache\n .with_columns\n\n Datastream.merge\n Datastream.zip\n Datastream\n .map\n .data_loader\n .zip_index\n .update_weights_\n .update_example_weight_\n .weight\n .state_dict\n .load_state_dict\n\nMerge / stratify / oversample datastreams\n-----------------------------------------\nThe fruit datastreams given below repeatedly yields the string of its fruit\ntype.\n\n.. code-block:: python\n\n >>> datastream = Datastream.merge([\n ... (apple_datastream, 2),\n ... (pear_datastream, 1),\n ... (banana_datastream, 1),\n ... ])\n >>> next(iter(datastream.data_loader(batch_size=8)))\n ['apple', 'apple', 'pear', 'banana', 'apple', 'apple', 'pear', 'banana']\n\nZip independently sampled datastreams\n-------------------------------------\nThe fruit datastreams given below repeatedly yields the string of its fruit\ntype.\n\n.. code-block:: python\n\n >>> datastream = Datastream.zip([\n ... apple_datastream,\n ... Datastream.merge([pear_datastream, banana_datastream]),\n ... ])\n >>> next(iter(datastream.data_loader(batch_size=4)))\n [('apple', 'pear'), ('apple', 'banana'), ('apple', 'pear'), ('apple', 'banana')]\n\nMore usage examples\n-------------------\nSee the `documentation <https://pytorch-datastream.readthedocs.io/en/latest/>`_\nfor more usage examples.\n\nInstall from source\n===================\n\n.. pip install -e .\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Simple dataset to dataloader library for pytorch",
"version": "0.4.10",
"project_urls": {
"Documentation": "https://pytorch-datastream.readthedocs.io",
"Homepage": "https://github.com/nextml-code/pytorch-datastream",
"Repository": "https://github.com/nextml-code/pytorch-datastream"
},
"split_keywords": [
"pytorch",
"machine",
"learning",
"dataset",
"pipeline",
"dataloader"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "dfd55067dd96e96aea49737f7b7cd411645d76bb2327fb495ffb89c82de431fe",
"md5": "e67bf36fbe4f82e3e9046fdf97fa6429",
"sha256": "89afca7f52fe24351caf69bcba44c2e5c2a1bfba05e0421da27d80006ef4283a"
},
"downloads": -1,
"filename": "pytorch_datastream-0.4.10-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e67bf36fbe4f82e3e9046fdf97fa6429",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8,<4.0",
"size": 28896,
"upload_time": "2023-11-28T15:12:42",
"upload_time_iso_8601": "2023-11-28T15:12:42.030437Z",
"url": "https://files.pythonhosted.org/packages/df/d5/5067dd96e96aea49737f7b7cd411645d76bb2327fb495ffb89c82de431fe/pytorch_datastream-0.4.10-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "edd37f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b",
"md5": "373abf64ad2c0e53b7f31fd8531a49d5",
"sha256": "2943cf82091090d1d459cf4e7f3ac7a242f038efb95709dc0b4e08f5c7896bfd"
},
"downloads": -1,
"filename": "pytorch_datastream-0.4.10.tar.gz",
"has_sig": false,
"md5_digest": "373abf64ad2c0e53b7f31fd8531a49d5",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8,<4.0",
"size": 23493,
"upload_time": "2023-11-28T15:12:43",
"upload_time_iso_8601": "2023-11-28T15:12:43.257256Z",
"url": "https://files.pythonhosted.org/packages/ed/d3/7f5481bbe604973c4995b2734019a4ffa3220a7f520f8b62bbd18c98fb8b/pytorch_datastream-0.4.10.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-28 15:12:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nextml-code",
"github_project": "pytorch-datastream",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pytorch-datastream"
}