s3pathlib


Names3pathlib JSON
Version 2.3.1 PyPI version JSON
download
home_pagehttps://github.com/aws-samples/s3pathlib-project
SummaryObjective Oriented Interface for AWS S3, similar to pathlib.
upload_time2024-08-31 04:48:15
maintainerSanhe Hu
docs_urlNone
authorSanhe Hu
requires_python>=3.7
licenseApache License 2.0
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            .. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest
    :target: https://s3pathlib.readthedocs.io/en/latest/
    :alt: Documentation Status

.. image:: https://github.com/aws-samples/s3pathlib-project/workflows/CI/badge.svg
    :target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI

.. image:: https://img.shields.io/badge/codecov-100%25-brightgreen
    :target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI

.. image:: https://img.shields.io/pypi/v/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib

.. image:: https://img.shields.io/pypi/l/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib

.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib
    
.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib

.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
    :target: https://github.com/aws-samples/s3pathlib-project

------

.. image:: https://img.shields.io/badge/Link-Document-orange.svg
    :target: https://s3pathlib.readthedocs.io/en/latest/

.. image:: https://img.shields.io/badge/Link-API-blue.svg
    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html

.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg
    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html

.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg
    :target: https://github.com/aws-samples/s3pathlib-project/issues

.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg
    :target: https://github.com/aws-samples/s3pathlib-project/issues

.. image:: https://img.shields.io/badge/Link-Download-blue.svg
    :target: https://pypi.org/pypi/s3pathlib#files


Welcome to ``s3pathlib`` Documentation
==============================================================================
`s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib <https://docs.python.org/3/library/pathlib.html>`_ and is user-friendly. The package also `supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>`_ in AWS S3.

.. note::

    You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_


Quick Start
------------------------------------------------------------------------------
.. note::

    `COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_

**Import the library, declare an S3Path object**

.. code-block:: python

    # import
    >>> from s3pathlib import S3Path

    # construct from string, auto join parts
    >>> p = S3Path("bucket", "folder", "file.txt")
    # construct from S3 URI works too
    >>> p = S3Path("s3://bucket/folder/file.txt")
    # construct from S3 ARN works too
    >>> p = S3Path("arn:aws:s3:::bucket/folder/file.txt")
    >>> p.bucket
    'bucket'
    >>> p.key
    'folder/file.txt'
    >>> p.uri
    's3://bucket/folder/file.txt'
    >>> p.console_url # click to preview it in AWS console
    'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'
    >>> p.arn
    'arn:aws:s3:::bucket/folder/file.txt'

**Talk to AWS S3 and get some information**

.. code-block:: python

    # s3pathlib maintains a "context" object that holds the AWS authentication information
    # you just need to build your own boto session object and attach to it
    >>> import boto3
    >>> from s3pathlib import context
    >>> context.attach_boto_session(
    ...     boto3.session.Session(
    ...         region_name="us-east-1",
    ...         profile_name="my_aws_profile",
    ...     )
    ... )

    >>> p = S3Path("bucket", "folder", "file.txt")
    >>> p.write_text("a lot of data ...")
    >>> p.etag
    '3e20b77868d1a39a587e280b99cec4a8'
    >>> p.size
    56789000
    >>> p.size_for_human
    '51.16 MB'

    # folder works too, you just need to use a tailing "/" to identify that
    >>> p = S3Path("bucket", "datalake/")
    >>> p.count_objects()
    7164 # number of files under this prefix
    >>> p.calculate_total_size()
    (7164, 236483701963) # 7164 objects, 220.24 GB
    >>> p.calculate_total_size(for_human=True)
    (7164, '220.24 GB') # 7164 objects, 220.24 GB

**Manipulate Folder in S3**

Native S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**

.. code-block:: python

    # create a S3 folder
    >>> p = S3Path("bucket", "github", "repos", "my-repo/")

    # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/
    >>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False)

    # copy entire s3 folder to another s3 folder
    >>> p2 = S3Path("bucket", "github", "repos", "another-repo/")
    >>> p1.copy_to(p2, overwrite=True)

    # delete all objects in the folder, recursively, to clean up your test bucket
    >>> p.delete()
    >>> p2.delete()

**S3 Path Filter**

Ever think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:

.. code-block:: python

    >>> s3bkt = S3Path("bucket") # assume you have a lots of files in this bucket
    >>> iterproxy = s3bkt.iter_objects().filter(
    ...     S3Path.size >= 10_000_000, S3Path.ext == ".csv" # add filter
    ... )

    >>> iterproxy.one() # fetch one
    S3Path('s3://bucket/larger-than-10MB-1.csv')

    >>> iterproxy.many(3) # fetch three
    [
        S3Path('s3://bucket/larger-than-10MB-1.csv'),
        S3Path('s3://bucket/larger-than-10MB-2.csv'),
        S3Path('s3://bucket/larger-than-10MB-3.csv'),
    ]

    >>> for p in iterproxy: # iter the rest
    ...     print(p)


**File Like Object for Simple IO**

``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:

.. code-block:: python

    # Stream big file by line
    >>> p = S3Path("bucket", "log.txt")
    >>> with p.open("r") as f:
    ...     for line in f:
    ...         do what every you want

    # JSON io
    >>> import json
    >>> p = S3Path("bucket", "config.json")
    >>> with p.open("w") as f:
    ...     json.dump({"password": "mypass"}, f)

    # pandas IO
    >>> import pandas as pd
    >>> p = S3Path("bucket", "dataset.csv")
    >>> df = pd.DataFrame(...)
    >>> with p.open("w") as f:
    ...     df.to_csv(f)

Now that you have a basic understanding of s3pathlib, let's read the `full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_ to explore its capabilities in greater depth.


Getting Help
------------------------------------------------------------------------------
Please use the ``python-s3pathlib`` tag on Stack Overflow to get help.

Submit a ``I want help`` issue tickets on `GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>`_


Contributing
------------------------------------------------------------------------------
Please see the `Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>`_.


Copyright
------------------------------------------------------------------------------
s3pathlib is an open source project. See the `LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>`_ file for more information.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/aws-samples/s3pathlib-project",
    "name": "s3pathlib",
    "maintainer": "Sanhe Hu",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "sanhehu@amazon.com",
    "keywords": null,
    "author": "Sanhe Hu",
    "author_email": "husanhe@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/62/0e/cdcffbc8ebed95a54d39bb26411c0fb64c7daea90223676f862bdb2a1191/s3pathlib-2.3.1.tar.gz",
    "platform": "Windows",
    "description": ".. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest\n    :target: https://s3pathlib.readthedocs.io/en/latest/\n    :alt: Documentation Status\n\n.. image:: https://github.com/aws-samples/s3pathlib-project/workflows/CI/badge.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI\n\n.. image:: https://img.shields.io/badge/codecov-100%25-brightgreen\n    :target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI\n\n.. image:: https://img.shields.io/pypi/v/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/l/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n    \n.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social\n    :target: https://github.com/aws-samples/s3pathlib-project\n\n------\n\n.. image:: https://img.shields.io/badge/Link-Document-orange.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/\n\n.. image:: https://img.shields.io/badge/Link-API-blue.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Download-blue.svg\n    :target: https://pypi.org/pypi/s3pathlib#files\n\n\nWelcome to ``s3pathlib`` Documentation\n==============================================================================\n`s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib <https://docs.python.org/3/library/pathlib.html>`_ and is user-friendly. The package also `supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>`_ in AWS S3.\n\n.. note::\n\n    You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_\n\n\nQuick Start\n------------------------------------------------------------------------------\n.. note::\n\n    `COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_\n\n**Import the library, declare an S3Path object**\n\n.. code-block:: python\n\n    # import\n    >>> from s3pathlib import S3Path\n\n    # construct from string, auto join parts\n    >>> p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n    # construct from S3 URI works too\n    >>> p = S3Path(\"s3://bucket/folder/file.txt\")\n    # construct from S3 ARN works too\n    >>> p = S3Path(\"arn:aws:s3:::bucket/folder/file.txt\")\n    >>> p.bucket\n    'bucket'\n    >>> p.key\n    'folder/file.txt'\n    >>> p.uri\n    's3://bucket/folder/file.txt'\n    >>> p.console_url # click to preview it in AWS console\n    'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'\n    >>> p.arn\n    'arn:aws:s3:::bucket/folder/file.txt'\n\n**Talk to AWS S3 and get some information**\n\n.. code-block:: python\n\n    # s3pathlib maintains a \"context\" object that holds the AWS authentication information\n    # you just need to build your own boto session object and attach to it\n    >>> import boto3\n    >>> from s3pathlib import context\n    >>> context.attach_boto_session(\n    ...     boto3.session.Session(\n    ...         region_name=\"us-east-1\",\n    ...         profile_name=\"my_aws_profile\",\n    ...     )\n    ... )\n\n    >>> p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n    >>> p.write_text(\"a lot of data ...\")\n    >>> p.etag\n    '3e20b77868d1a39a587e280b99cec4a8'\n    >>> p.size\n    56789000\n    >>> p.size_for_human\n    '51.16 MB'\n\n    # folder works too, you just need to use a tailing \"/\" to identify that\n    >>> p = S3Path(\"bucket\", \"datalake/\")\n    >>> p.count_objects()\n    7164 # number of files under this prefix\n    >>> p.calculate_total_size()\n    (7164, 236483701963) # 7164 objects, 220.24 GB\n    >>> p.calculate_total_size(for_human=True)\n    (7164, '220.24 GB') # 7164 objects, 220.24 GB\n\n**Manipulate Folder in S3**\n\nNative S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**\n\n.. code-block:: python\n\n    # create a S3 folder\n    >>> p = S3Path(\"bucket\", \"github\", \"repos\", \"my-repo/\")\n\n    # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/\n    >>> p.upload_dir(\"/my-repo\", pattern=\"**/*.py\", overwrite=False)\n\n    # copy entire s3 folder to another s3 folder\n    >>> p2 = S3Path(\"bucket\", \"github\", \"repos\", \"another-repo/\")\n    >>> p1.copy_to(p2, overwrite=True)\n\n    # delete all objects in the folder, recursively, to clean up your test bucket\n    >>> p.delete()\n    >>> p2.delete()\n\n**S3 Path Filter**\n\nEver think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:\n\n.. code-block:: python\n\n    >>> s3bkt = S3Path(\"bucket\") # assume you have a lots of files in this bucket\n    >>> iterproxy = s3bkt.iter_objects().filter(\n    ...     S3Path.size >= 10_000_000, S3Path.ext == \".csv\" # add filter\n    ... )\n\n    >>> iterproxy.one() # fetch one\n    S3Path('s3://bucket/larger-than-10MB-1.csv')\n\n    >>> iterproxy.many(3) # fetch three\n    [\n        S3Path('s3://bucket/larger-than-10MB-1.csv'),\n        S3Path('s3://bucket/larger-than-10MB-2.csv'),\n        S3Path('s3://bucket/larger-than-10MB-3.csv'),\n    ]\n\n    >>> for p in iterproxy: # iter the rest\n    ...     print(p)\n\n\n**File Like Object for Simple IO**\n\n``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:\n\n.. code-block:: python\n\n    # Stream big file by line\n    >>> p = S3Path(\"bucket\", \"log.txt\")\n    >>> with p.open(\"r\") as f:\n    ...     for line in f:\n    ...         do what every you want\n\n    # JSON io\n    >>> import json\n    >>> p = S3Path(\"bucket\", \"config.json\")\n    >>> with p.open(\"w\") as f:\n    ...     json.dump({\"password\": \"mypass\"}, f)\n\n    # pandas IO\n    >>> import pandas as pd\n    >>> p = S3Path(\"bucket\", \"dataset.csv\")\n    >>> df = pd.DataFrame(...)\n    >>> with p.open(\"w\") as f:\n    ...     df.to_csv(f)\n\nNow that you have a basic understanding of s3pathlib, let's read the `full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_ to explore its capabilities in greater depth.\n\n\nGetting Help\n------------------------------------------------------------------------------\nPlease use the ``python-s3pathlib`` tag on Stack Overflow to get help.\n\nSubmit a ``I want help`` issue tickets on `GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>`_\n\n\nContributing\n------------------------------------------------------------------------------\nPlease see the `Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>`_.\n\n\nCopyright\n------------------------------------------------------------------------------\ns3pathlib is an open source project. See the `LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>`_ file for more information.\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Objective Oriented Interface for AWS S3, similar to pathlib.",
    "version": "2.3.1",
    "project_urls": {
        "Download": "https://pypi.python.org/pypi/s3pathlib/2.3.1#downloads",
        "Homepage": "https://github.com/aws-samples/s3pathlib-project"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2de59ba187c818eed81afdecbc3a5903ebb7df5c27ad19c42acd9355cb903fcd",
                "md5": "cef9ba3d62af844f9fc923cffb2011a9",
                "sha256": "156aa47cf3d41dd622c317a1bb54182b3d83799916d9f088e16557ba28e40496"
            },
            "downloads": -1,
            "filename": "s3pathlib-2.3.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cef9ba3d62af844f9fc923cffb2011a9",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.7",
            "size": 76168,
            "upload_time": "2024-08-31T04:48:13",
            "upload_time_iso_8601": "2024-08-31T04:48:13.454760Z",
            "url": "https://files.pythonhosted.org/packages/2d/e5/9ba187c818eed81afdecbc3a5903ebb7df5c27ad19c42acd9355cb903fcd/s3pathlib-2.3.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "620ecdcffbc8ebed95a54d39bb26411c0fb64c7daea90223676f862bdb2a1191",
                "md5": "31ccf64a7841bc4e51c863205c3bea53",
                "sha256": "fd0fe06ef0d67e94ee3626d893deb9583a8a3b5a4e8e45fb183cae398bf60b6f"
            },
            "downloads": -1,
            "filename": "s3pathlib-2.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "31ccf64a7841bc4e51c863205c3bea53",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 63088,
            "upload_time": "2024-08-31T04:48:15",
            "upload_time_iso_8601": "2024-08-31T04:48:15.389401Z",
            "url": "https://files.pythonhosted.org/packages/62/0e/cdcffbc8ebed95a54d39bb26411c0fb64c7daea90223676f862bdb2a1191/s3pathlib-2.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-31 04:48:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "aws-samples",
    "github_project": "s3pathlib-project",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "s3pathlib"
}
        
Elapsed time: 2.69370s