vineyard-io


Namevineyard-io JSON
Version 0.21.3 PyPI version JSON
download
home_pagehttps://v6d.io
SummaryIO drivers for vineyard
upload_time2024-03-01 03:12:20
maintainer
docs_urlNone
authorThe vineyard team
requires_python
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            .. image:: https://v6d.io/_static/vineyard_logo.png
   :target: https://v6d.io
   :align: center
   :alt: vineyard
   :width: 397px

vineyard-io: IO drivers for `vineyard <https://v6d.io>`_
--------------------------------------------------------

vineyard-io is a collection of IO drivers for `vineyard <https://v6d.io>`_. Currently it supports

* Local filesystem
* AWS S3
* Aliyun OSS
* Hadoop filesystem

The vineyard-io package leverages the `filesystem-spec <http://filesystem-spec.readthedocs.io/>`_
to support other storage sinks and sources in a unified fashion. Other adaptors that works for fsspec
could be plugged in as well.

IO Adaptors
~~~~~~~~~~~

Vineyard has a set of prebuilt IO adaptors, that can serve as common routines for
various IO operations and can take place of boilerplate parts in computation tasks.

Vineyard is capable of reading from and writing data to multiple file systems.
Behind the scene, it leverage :code:`fsspec` to delegate the workload to various file system implementations.

Specifically, we can specify parameters to be passed to the file system, through the :code:`storage_options` parameter.
:code:`storage_options` is a dict that pass additional keywords to the file system,
For instance, we could combine :code:`path` = `hdfs:///path/to/file` with :code:`storage_options = {"host": "localhost", "port": 9600}`
to read from a HDFS.

Note that you must encode the :code:`storage_options` by base64 before passing it to the scripts.

Alternatively, we can encode such information into the path,
such as: :code:`hdfs://<ip>:<port>/path/to/file`.

To read from multiple files you can pass a glob string or a list of paths,
with the caveat that they must all have the same protocol.

Their functionality are described as follows:

+ :code:`read_bytes`

  .. code:: console

    Usage: vineyard_read_bytes <ipc_socket> <path> <storage_options> <read_options> <proc_num> <proc_index>

  Read a file on local file systems, OSS, HDFS, S3, etc. to :code:`ByteStream`.

+ :code:`write_bytes`

  .. code:: console

    Usage: vineyard_write_bytes <ipc_socket> <path> <stream_id> <storage_options> <write_options> <proc_num> <proc_index>

  Write a :code:`ByteStream` to a file on local file system, OSS, HDFS, S3, etc.

+ :code:`read_orc`

  .. code:: console

    Usage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>

  Read a ORC file on local file systems, OSS, HDFS, S3, etc. to :code:`DataframeStream`.

+ :code:`write_orc`

  .. code:: console

    Usage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>

  Write a :code:`DataframeStream` to a ORC file on local file system, OSS, HDFS, S3, etc.

+ :code:`read_vineyard_dataframe`

  .. code:: console

    Usage: vineyard_read_vineyard_dataframe <ipc_socket> <vineyard_address> <storage_options> <read_options> <proc num> <proc index>

  Read a :code:`DataFrame` in vineyard as a :code:`DataframeStream`.

+ :code:`write_vineyard_dataframe`

  .. code:: console

    Usage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>

  Write a :code:`DataframeStream` to a :code:`DataFrame` in vineyard.

+ :code:`serializer`

  .. code:: console

    Usage: vineyard_serializer <ipc_socket> <object_id>

  Serialize a vineyard object (non-global or global) as a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`).

+ :code:`deserializer`

  .. code:: console

    Usage: vineyard_deserializer <ipc_socket> <object_id>

  Deserialize a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`) as a vineyard object.

+ :code:`read_bytes_collection`

  .. code:: console

    Usage: vineyard_read_bytes_collection <ipc_socket> <prefix> <storage_options> <proc_num> <proc_index>

  Read a directory (on local filesystem, OSS, HDFS, S3, etc.) as a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`).

+ :code:`write_bytes_collection`

  .. code:: console

    Usage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>

  Write a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`) to a directory (on local filesystem, OSS, HDFS, S3, etc.).

+ :code:`parse_bytes_to_dataframe`

  .. code:: console

    Usage: vineyard_parse_bytes_to_dataframe.py <ipc_socket> <stream_id> <proc_num> <proc_index>

  Parse a :code:`ByteStream` (in CSV format) as a :code:`DataframeStream`.

+ :code:`parse_dataframe_to_bytes`

  .. code:: console

    Usage: vineyard_parse_dataframe_to_bytes <ipc_socket> <stream_id> <proc_num> <proc_index>

  Serialize a :code:`DataframeStream` to a :code:`ByteStream` (in CSV format).

+ :code:`dump_dataframe`

  .. code:: console

    Usage: vineyard_dump_dataframe <ipc_socket> <stream_id>

  Dump the content of a :code:`DataframeStream`, for debugging usage.



            

Raw data

            {
    "_id": null,
    "home_page": "https://v6d.io",
    "name": "vineyard-io",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "The vineyard team",
    "author_email": "developers@v6d.io",
    "download_url": "",
    "platform": "POSIX",
    "description": ".. image:: https://v6d.io/_static/vineyard_logo.png\n   :target: https://v6d.io\n   :align: center\n   :alt: vineyard\n   :width: 397px\n\nvineyard-io: IO drivers for `vineyard <https://v6d.io>`_\n--------------------------------------------------------\n\nvineyard-io is a collection of IO drivers for `vineyard <https://v6d.io>`_. Currently it supports\n\n* Local filesystem\n* AWS S3\n* Aliyun OSS\n* Hadoop filesystem\n\nThe vineyard-io package leverages the `filesystem-spec <http://filesystem-spec.readthedocs.io/>`_\nto support other storage sinks and sources in a unified fashion. Other adaptors that works for fsspec\ncould be plugged in as well.\n\nIO Adaptors\n~~~~~~~~~~~\n\nVineyard has a set of prebuilt IO adaptors, that can serve as common routines for\nvarious IO operations and can take place of boilerplate parts in computation tasks.\n\nVineyard is capable of reading from and writing data to multiple file systems.\nBehind the scene, it leverage :code:`fsspec` to delegate the workload to various file system implementations.\n\nSpecifically, we can specify parameters to be passed to the file system, through the :code:`storage_options` parameter.\n:code:`storage_options` is a dict that pass additional keywords to the file system,\nFor instance, we could combine :code:`path` = `hdfs:///path/to/file` with :code:`storage_options = {\"host\": \"localhost\", \"port\": 9600}`\nto read from a HDFS.\n\nNote that you must encode the :code:`storage_options` by base64 before passing it to the scripts.\n\nAlternatively, we can encode such information into the path,\nsuch as: :code:`hdfs://<ip>:<port>/path/to/file`.\n\nTo read from multiple files you can pass a glob string or a list of paths,\nwith the caveat that they must all have the same protocol.\n\nTheir functionality are described as follows:\n\n+ :code:`read_bytes`\n\n  .. code:: console\n\n    Usage: vineyard_read_bytes <ipc_socket> <path> <storage_options> <read_options> <proc_num> <proc_index>\n\n  Read a file on local file systems, OSS, HDFS, S3, etc. to :code:`ByteStream`.\n\n+ :code:`write_bytes`\n\n  .. code:: console\n\n    Usage: vineyard_write_bytes <ipc_socket> <path> <stream_id> <storage_options> <write_options> <proc_num> <proc_index>\n\n  Write a :code:`ByteStream` to a file on local file system, OSS, HDFS, S3, etc.\n\n+ :code:`read_orc`\n\n  .. code:: console\n\n    Usage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>\n\n  Read a ORC file on local file systems, OSS, HDFS, S3, etc. to :code:`DataframeStream`.\n\n+ :code:`write_orc`\n\n  .. code:: console\n\n    Usage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>\n\n  Write a :code:`DataframeStream` to a ORC file on local file system, OSS, HDFS, S3, etc.\n\n+ :code:`read_vineyard_dataframe`\n\n  .. code:: console\n\n    Usage: vineyard_read_vineyard_dataframe <ipc_socket> <vineyard_address> <storage_options> <read_options> <proc num> <proc index>\n\n  Read a :code:`DataFrame` in vineyard as a :code:`DataframeStream`.\n\n+ :code:`write_vineyard_dataframe`\n\n  .. code:: console\n\n    Usage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>\n\n  Write a :code:`DataframeStream` to a :code:`DataFrame` in vineyard.\n\n+ :code:`serializer`\n\n  .. code:: console\n\n    Usage: vineyard_serializer <ipc_socket> <object_id>\n\n  Serialize a vineyard object (non-global or global) as a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`).\n\n+ :code:`deserializer`\n\n  .. code:: console\n\n    Usage: vineyard_deserializer <ipc_socket> <object_id>\n\n  Deserialize a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`) as a vineyard object.\n\n+ :code:`read_bytes_collection`\n\n  .. code:: console\n\n    Usage: vineyard_read_bytes_collection <ipc_socket> <prefix> <storage_options> <proc_num> <proc_index>\n\n  Read a directory (on local filesystem, OSS, HDFS, S3, etc.) as a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`).\n\n+ :code:`write_bytes_collection`\n\n  .. code:: console\n\n    Usage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>\n\n  Write a :code:`ByteStream` or a set of :code:`ByteStream` (:code:`StreamCollection`) to a directory (on local filesystem, OSS, HDFS, S3, etc.).\n\n+ :code:`parse_bytes_to_dataframe`\n\n  .. code:: console\n\n    Usage: vineyard_parse_bytes_to_dataframe.py <ipc_socket> <stream_id> <proc_num> <proc_index>\n\n  Parse a :code:`ByteStream` (in CSV format) as a :code:`DataframeStream`.\n\n+ :code:`parse_dataframe_to_bytes`\n\n  .. code:: console\n\n    Usage: vineyard_parse_dataframe_to_bytes <ipc_socket> <stream_id> <proc_num> <proc_index>\n\n  Serialize a :code:`DataframeStream` to a :code:`ByteStream` (in CSV format).\n\n+ :code:`dump_dataframe`\n\n  .. code:: console\n\n    Usage: vineyard_dump_dataframe <ipc_socket> <stream_id>\n\n  Dump the content of a :code:`DataframeStream`, for debugging usage.\n\n\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "IO drivers for vineyard",
    "version": "0.21.3",
    "project_urls": {
        "Documentation": "https://v6d.io",
        "Homepage": "https://v6d.io",
        "Source": "https://github.com/v6d-io/v6d",
        "Tracker": "https://github.com/v6d-io/v6d/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "72dc3dae6a5dff519e0255f538fb6d919b0c7189ec7771dab4c49631aba0030b",
                "md5": "46d0a48ed4d1674253add68ede6e8fa9",
                "sha256": "d6a18bc01e923d0e6a8e4f0079cf23a137b51a1d0c88066e3cadb130ae67900e"
            },
            "downloads": -1,
            "filename": "vineyard_io-0.21.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "46d0a48ed4d1674253add68ede6e8fa9",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 82449,
            "upload_time": "2024-03-01T03:12:20",
            "upload_time_iso_8601": "2024-03-01T03:12:20.653839Z",
            "url": "https://files.pythonhosted.org/packages/72/dc/3dae6a5dff519e0255f538fb6d919b0c7189ec7771dab4c49631aba0030b/vineyard_io-0.21.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-01 03:12:20",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "v6d-io",
    "github_project": "v6d",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "vineyard-io"
}
        
Elapsed time: 0.20325s