pandas-streaming: streaming API over pandas
===========================================
.. image:: https://ci.appveyor.com/api/projects/status/4te066r8ne1ymmhy?svg=true
:target: https://ci.appveyor.com/project/sdpython/pandas-streaming
:alt: Build Status Windows
.. image:: https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming
:target: https://dev.azure.com/xavierdupre3/pandas_streaming/
.. image:: https://badge.fury.io/py/pandas_streaming.svg
:target: http://badge.fury.io/py/pandas_streaming
.. image:: https://img.shields.io/badge/license-MIT-blue.svg
:alt: MIT License
:target: https://opensource.org/license/MIT/
.. image:: https://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8
:target: https://codecov.io/gh/sdpython/pandas-streaming
.. image:: http://img.shields.io/github/issues/sdpython/pandas_streaming.png
:alt: GitHub Issues
:target: https://github.com/sdpython/pandas_streaming/issues
.. image:: https://pepy.tech/badge/pandas_streaming/month
:target: https://pepy.tech/project/pandas_streaming/month
:alt: Downloads
.. image:: https://img.shields.io/github/forks/sdpython/pandas_streaming.svg
:target: https://github.com/sdpython/pandas_streaming/
:alt: Forks
.. image:: https://img.shields.io/github/stars/sdpython/pandas_streaming.svg
:target: https://github.com/sdpython/pandas_streaming/
:alt: Stars
.. image:: https://img.shields.io/github/repo-size/sdpython/pandas_streaming
:target: https://github.com/sdpython/pandas_streaming/
:alt: size
`pandas-streaming <https://sdpython.github.io/doc/pandas-streaming/dev/>`_
aims at processing big files with `pandas <https://pandas.pydata.org/>`_,
too big to hold in memory, too small to be parallelized with a significant gain.
The module replicates a subset of *pandas* API
and implements other functionalities for machine learning.
.. code-block:: python
from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_csv("filename", sep="\t", encoding="utf-8")
for df in sdf:
# process this chunk of data
# df is a dataframe
print(df)
The module can also stream an existing dataframe.
.. code-block:: python
import pandas
df = pandas.DataFrame([dict(cf=0, cint=0, cstr="0"),
dict(cf=1, cint=1, cstr="1"),
dict(cf=3, cint=3, cstr="3")])
from pandas_streaming.df import StreamingDataFrame
sdf = StreamingDataFrame.read_df(df)
for df in sdf:
# process this chunk of data
# df is a dataframe
print(df)
It contains other helpers to split datasets into
train and test with some weird constraints.
Raw data
{
"_id": null,
"home_page": "https://github.com/sdpython/pandas-streaming",
"name": "pandas-streaming",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Xavier Dupr\u00e9",
"author_email": "xavier.dupre@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/6b/f0/b42921e2c35d7444fda7fa96b0dac34eecbd7c92e9de9e963ca002d14713/pandas_streaming-0.5.1.tar.gz",
"platform": null,
"description": "pandas-streaming: streaming API over pandas\n===========================================\n\n.. image:: https://ci.appveyor.com/api/projects/status/4te066r8ne1ymmhy?svg=true\n :target: https://ci.appveyor.com/project/sdpython/pandas-streaming\n :alt: Build Status Windows\n\n.. image:: https://dev.azure.com/xavierdupre3/pandas_streaming/_apis/build/status/sdpython.pandas_streaming\n :target: https://dev.azure.com/xavierdupre3/pandas_streaming/\n\n.. image:: https://badge.fury.io/py/pandas_streaming.svg\n :target: http://badge.fury.io/py/pandas_streaming\n\n.. image:: https://img.shields.io/badge/license-MIT-blue.svg\n :alt: MIT License\n :target: https://opensource.org/license/MIT/\n\n.. image:: https://codecov.io/gh/sdpython/pandas-streaming/branch/main/graph/badge.svg?token=0caHX1rhr8 \n :target: https://codecov.io/gh/sdpython/pandas-streaming\n\n.. image:: http://img.shields.io/github/issues/sdpython/pandas_streaming.png\n :alt: GitHub Issues\n :target: https://github.com/sdpython/pandas_streaming/issues\n\n.. image:: https://pepy.tech/badge/pandas_streaming/month\n :target: https://pepy.tech/project/pandas_streaming/month\n :alt: Downloads\n\n.. image:: https://img.shields.io/github/forks/sdpython/pandas_streaming.svg\n :target: https://github.com/sdpython/pandas_streaming/\n :alt: Forks\n\n.. image:: https://img.shields.io/github/stars/sdpython/pandas_streaming.svg\n :target: https://github.com/sdpython/pandas_streaming/\n :alt: Stars\n\n.. image:: https://img.shields.io/github/repo-size/sdpython/pandas_streaming\n :target: https://github.com/sdpython/pandas_streaming/\n :alt: size\n\n`pandas-streaming <https://sdpython.github.io/doc/pandas-streaming/dev/>`_\naims at processing big files with `pandas <https://pandas.pydata.org/>`_,\ntoo big to hold in memory, too small to be parallelized with a significant gain.\nThe module replicates a subset of *pandas* API\nand implements other functionalities for machine learning.\n\n.. code-block:: python\n\n from pandas_streaming.df import StreamingDataFrame\n sdf = StreamingDataFrame.read_csv(\"filename\", sep=\"\\t\", encoding=\"utf-8\")\n\n for df in sdf:\n # process this chunk of data\n # df is a dataframe\n print(df)\n\nThe module can also stream an existing dataframe.\n\n.. code-block:: python\n\n import pandas\n df = pandas.DataFrame([dict(cf=0, cint=0, cstr=\"0\"),\n dict(cf=1, cint=1, cstr=\"1\"),\n dict(cf=3, cint=3, cstr=\"3\")])\n\n from pandas_streaming.df import StreamingDataFrame\n sdf = StreamingDataFrame.read_df(df)\n\n for df in sdf:\n # process this chunk of data\n # df is a dataframe\n print(df)\n\nIt contains other helpers to split datasets into\ntrain and test with some weird constraints.\n",
"bugtrack_url": null,
"license": null,
"summary": "Array (and numpy) API for ONNX",
"version": "0.5.1",
"project_urls": {
"Homepage": "https://github.com/sdpython/pandas-streaming"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "38bba7e7f01c416200ed03247c688e080d2b50bf697b3819aa368a63ba2c7cc2",
"md5": "e2e5adbdad3dfb8ba06c6aecb241116f",
"sha256": "5c5780742c8c6fcf86b7871caa9b4b52a6c10463ed2bd4cebcb6bda9d06c59cc"
},
"downloads": -1,
"filename": "pandas_streaming-0.5.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e2e5adbdad3dfb8ba06c6aecb241116f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 36803,
"upload_time": "2024-09-14T16:54:34",
"upload_time_iso_8601": "2024-09-14T16:54:34.146714Z",
"url": "https://files.pythonhosted.org/packages/38/bb/a7e7f01c416200ed03247c688e080d2b50bf697b3819aa368a63ba2c7cc2/pandas_streaming-0.5.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6bf0b42921e2c35d7444fda7fa96b0dac34eecbd7c92e9de9e963ca002d14713",
"md5": "42933061542ed3d65536010c279fe184",
"sha256": "ad34c07cd271ea43832962f9eef9e16b3b8cd281748a55de95c2719fc7f7aae9"
},
"downloads": -1,
"filename": "pandas_streaming-0.5.1.tar.gz",
"has_sig": false,
"md5_digest": "42933061542ed3d65536010c279fe184",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 34386,
"upload_time": "2024-09-14T16:54:36",
"upload_time_iso_8601": "2024-09-14T16:54:36.009784Z",
"url": "https://files.pythonhosted.org/packages/6b/f0/b42921e2c35d7444fda7fa96b0dac34eecbd7c92e9de9e963ca002d14713/pandas_streaming-0.5.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-14 16:54:36",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "sdpython",
"github_project": "pandas-streaming",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"appveyor": true,
"requirements": [],
"lcname": "pandas-streaming"
}