hadoop-fs-wrapper


Namehadoop-fs-wrapper JSON
Version 0.7.1 PyPI version JSON
download
home_pagehttps://github.com/SneaksAndData/hadoop-fs-wrapper
SummaryPython Wrapper for Hadoop Java API
upload_time2024-10-24 11:28:12
maintainerGZU
docs_urlNone
authorECCO Sneaks & Data
requires_python<3.13,>=3.11
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Hadoop FileSystem Java Class Wrapper 
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Typed Python wrappers for [Hadoop FileSystem](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html) class family.

## Installation
You can install this package from `pypi` on any Hadoop or Spark runtime:
```commandline
pip install hadoop-fs-wrapper
```

Select a version that matches hadoop version you are using:

| Hadoop Version / Spark version | Compatible hadoop-fs-wrapper version |
|--------------------------------|:------------------------------------:|
| 3.2.x / 3.2.x                  |                0.4.x                 |
| 3.3.x / 3.3.x                  |             0.4.x, 0.5.x             |
| 3.3.x / 3.4.x                  |                0.6.x                 |
| 3.5.x / 3.5.x                  |                0.7.x                 |

## Usage
Common use case is accessing Hadoop FileSystem from Spark session object:

```python
from hadoop_fs_wrapper.wrappers.file_system import FileSystem

file_system = FileSystem.from_spark_session(spark=spark_session)
```

Then, for example, one can check if there are any files under specified path:
```python
from hadoop_fs_wrapper.wrappers.file_system import FileSystem

def is_valid_source_path(file_system: FileSystem, path: str) -> bool:
    """
     Checks whether a regexp path refers to a valid set of paths
    :param file_system: pyHadooopWrapper FileSystem
    :param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv
    :return: true if path resolves to existing paths, otherwise false
    """
    return len(file_system.glob_status(path)) > 0
```

## Contribution

Currently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped,
please open an issue or create a PR.

All changes are tested against Spark 3.4 running in local mode.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/SneaksAndData/hadoop-fs-wrapper",
    "name": "hadoop-fs-wrapper",
    "maintainer": "GZU",
    "docs_url": null,
    "requires_python": "<3.13,>=3.11",
    "maintainer_email": "gzu@ecco.com",
    "keywords": null,
    "author": "ECCO Sneaks & Data",
    "author_email": "esdsupport@ecco.com",
    "download_url": "https://files.pythonhosted.org/packages/d0/4f/8790e7eafb2595df66ee7b6b926c71bc0dbd7a4b49cf93dc3b187edb1c4e/hadoop_fs_wrapper-0.7.1.tar.gz",
    "platform": null,
    "description": "# Hadoop FileSystem Java Class Wrapper \n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\nTyped Python wrappers for [Hadoop FileSystem](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html) class family.\n\n## Installation\nYou can install this package from `pypi` on any Hadoop or Spark runtime:\n```commandline\npip install hadoop-fs-wrapper\n```\n\nSelect a version that matches hadoop version you are using:\n\n| Hadoop Version / Spark version | Compatible hadoop-fs-wrapper version |\n|--------------------------------|:------------------------------------:|\n| 3.2.x / 3.2.x                  |                0.4.x                 |\n| 3.3.x / 3.3.x                  |             0.4.x, 0.5.x             |\n| 3.3.x / 3.4.x                  |                0.6.x                 |\n| 3.5.x / 3.5.x                  |                0.7.x                 |\n\n## Usage\nCommon use case is accessing Hadoop FileSystem from Spark session object:\n\n```python\nfrom hadoop_fs_wrapper.wrappers.file_system import FileSystem\n\nfile_system = FileSystem.from_spark_session(spark=spark_session)\n```\n\nThen, for example, one can check if there are any files under specified path:\n```python\nfrom hadoop_fs_wrapper.wrappers.file_system import FileSystem\n\ndef is_valid_source_path(file_system: FileSystem, path: str) -> bool:\n    \"\"\"\n     Checks whether a regexp path refers to a valid set of paths\n    :param file_system: pyHadooopWrapper FileSystem\n    :param path: path e.g. (s3a|abfss|file|...)://hello@world.com/path/part*.csv\n    :return: true if path resolves to existing paths, otherwise false\n    \"\"\"\n    return len(file_system.glob_status(path)) > 0\n```\n\n## Contribution\n\nCurrently basic filesystem operations (listing, deleting, search, iterative listing etc.) are supported. If an operation you require is not yet wrapped,\nplease open an issue or create a PR.\n\nAll changes are tested against Spark 3.4 running in local mode.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python Wrapper for Hadoop Java API",
    "version": "0.7.1",
    "project_urls": {
        "Homepage": "https://github.com/SneaksAndData/hadoop-fs-wrapper",
        "Repository": "https://github.com/SneaksAndData/hadoop-fs-wrapper"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "39a99c388bc872598a029d2f72f2056b6007220657a2beadace732c910a2780b",
                "md5": "5b7d918200c95a9faf30a3f38125091d",
                "sha256": "a3b66a02aa0d6af9375471b728313e26ed376371fe0d0f2af93fd7a8fe8752d3"
            },
            "downloads": -1,
            "filename": "hadoop_fs_wrapper-0.7.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5b7d918200c95a9faf30a3f38125091d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13,>=3.11",
            "size": 24900,
            "upload_time": "2024-10-24T11:28:11",
            "upload_time_iso_8601": "2024-10-24T11:28:11.516996Z",
            "url": "https://files.pythonhosted.org/packages/39/a9/9c388bc872598a029d2f72f2056b6007220657a2beadace732c910a2780b/hadoop_fs_wrapper-0.7.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d04f8790e7eafb2595df66ee7b6b926c71bc0dbd7a4b49cf93dc3b187edb1c4e",
                "md5": "057ff80192d0b39dca46ababc7532723",
                "sha256": "d71f26974e9c81de0550003ac2a99fca3cc670bb52ddf48ecf432e43f434c89d"
            },
            "downloads": -1,
            "filename": "hadoop_fs_wrapper-0.7.1.tar.gz",
            "has_sig": false,
            "md5_digest": "057ff80192d0b39dca46ababc7532723",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13,>=3.11",
            "size": 9364,
            "upload_time": "2024-10-24T11:28:12",
            "upload_time_iso_8601": "2024-10-24T11:28:12.525293Z",
            "url": "https://files.pythonhosted.org/packages/d0/4f/8790e7eafb2595df66ee7b6b926c71bc0dbd7a4b49cf93dc3b187edb1c4e/hadoop_fs_wrapper-0.7.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-24 11:28:12",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "SneaksAndData",
    "github_project": "hadoop-fs-wrapper",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "hadoop-fs-wrapper"
}
        
Elapsed time: 0.30876s