s3pathlib


Names3pathlib JSON
Version 2.3.4 PyPI version JSON
download
home_pageNone
Summarys3pathlib is the python package provides the Pythonic objective oriented programming (OOP) interface to manipulate AWS S3 object / directory. The api is similar to the pathlib standard library and very intuitive for human.
upload_time2025-07-13 15:13:58
maintainerSanhe Hu
docs_urlNone
authorSanhe Hu
requires_python<4.0,>=3.9
licenseApache-2.0
keywords
VCS
bugtrack_url
requirements boto-session-manager boto3 botocore func-args iterproxy jmespath pathlib-mate python-dateutil s3transfer six smart-open urllib3 urllib3 wrapt
Travis-CI No Travis.
coveralls test coverage
            .. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest
    :target: https://s3pathlib.readthedocs.io/en/latest/
    :alt: Documentation Status

.. image:: https://github.com/MacHu-GWU/s3pathlib-project/actions/workflows/main.yml/badge.svg
    :target: https://github.com/MacHu-GWU/s3pathlib-project/actions?query=workflow:CI

.. image:: https://codecov.io/gh/MacHu-GWU/s3pathlib-project/branch/main/graph/badge.svg
    :target: https://codecov.io/gh/MacHu-GWU/s3pathlib-project

.. image:: https://img.shields.io/pypi/v/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib

.. image:: https://img.shields.io/pypi/l/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib

.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib
    
.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg
    :target: https://pypi.python.org/pypi/s3pathlib

.. image:: https://img.shields.io/badge/✍️_Release_History!--None.svg?style=social&logo=github
    :target: https://github.com/MacHu-GWU/s3pathlib-project/blob/main/release-history.rst

.. image:: https://img.shields.io/badge/⭐_Star_me_on_GitHub!--None.svg?style=social&logo=github
    :target: https://github.com/aws-samples/s3pathlib-project

------

.. image:: https://img.shields.io/badge/Link-API-blue.svg
    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html

.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg
    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html

.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg
    :target: https://github.com/aws-samples/s3pathlib-project/issues

.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg
    :target: https://github.com/aws-samples/s3pathlib-project/issues

.. image:: https://img.shields.io/badge/Link-Download-blue.svg
    :target: https://pypi.org/pypi/s3pathlib#files


Welcome to ``s3pathlib`` Documentation
==============================================================================
`s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib <https://docs.python.org/3/library/pathlib.html>`_ and is user-friendly. The package also `supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>`_ in AWS S3.

.. note::

    You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_


Quick Start
------------------------------------------------------------------------------
.. note::

    `COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_

**Import the library, declare an S3Path object**

.. code-block:: python

    # import
    >>> from s3pathlib import S3Path

    # construct from string, auto join parts
    >>> p = S3Path("bucket", "folder", "file.txt")
    # construct from S3 URI works too
    >>> p = S3Path("s3://bucket/folder/file.txt")
    # construct from S3 ARN works too
    >>> p = S3Path("arn:aws:s3:::bucket/folder/file.txt")
    >>> p.bucket
    'bucket'
    >>> p.key
    'folder/file.txt'
    >>> p.uri
    's3://bucket/folder/file.txt'
    >>> p.console_url # click to preview it in AWS console
    'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'
    >>> p.arn
    'arn:aws:s3:::bucket/folder/file.txt'

**Talk to AWS S3 and get some information**

.. code-block:: python

    # s3pathlib maintains a "context" object that holds the AWS authentication information
    # you just need to build your own boto session object and attach to it
    >>> import boto3
    >>> from s3pathlib import context
    >>> context.attach_boto_session(
    ...     boto3.session.Session(
    ...         region_name="us-east-1",
    ...         profile_name="my_aws_profile",
    ...     )
    ... )

    >>> p = S3Path("bucket", "folder", "file.txt")
    >>> p.write_text("a lot of data ...")
    >>> p.etag
    '3e20b77868d1a39a587e280b99cec4a8'
    >>> p.size
    56789000
    >>> p.size_for_human
    '51.16 MB'

    # folder works too, you just need to use a tailing "/" to identify that
    >>> p = S3Path("bucket", "datalake/")
    >>> p.count_objects()
    7164 # number of files under this prefix
    >>> p.calculate_total_size()
    (7164, 236483701963) # 7164 objects, 220.24 GB
    >>> p.calculate_total_size(for_human=True)
    (7164, '220.24 GB') # 7164 objects, 220.24 GB

**Manipulate Folder in S3**

Native S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**

.. code-block:: python

    # create a S3 folder
    >>> p = S3Path("bucket", "github", "repos", "my-repo/")

    # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/
    >>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False)

    # copy entire s3 folder to another s3 folder
    >>> p2 = S3Path("bucket", "github", "repos", "another-repo/")
    >>> p1.copy_to(p2, overwrite=True)

    # delete all objects in the folder, recursively, to clean up your test bucket
    >>> p.delete()
    >>> p2.delete()

**S3 Path Filter**

Ever think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:

.. code-block:: python

    >>> s3bkt = S3Path("bucket") # assume you have a lots of files in this bucket
    >>> iterproxy = s3bkt.iter_objects().filter(
    ...     S3Path.size >= 10_000_000, S3Path.ext == ".csv" # add filter
    ... )

    >>> iterproxy.one() # fetch one
    S3Path('s3://bucket/larger-than-10MB-1.csv')

    >>> iterproxy.many(3) # fetch three
    [
        S3Path('s3://bucket/larger-than-10MB-1.csv'),
        S3Path('s3://bucket/larger-than-10MB-2.csv'),
        S3Path('s3://bucket/larger-than-10MB-3.csv'),
    ]

    >>> for p in iterproxy: # iter the rest
    ...     print(p)


**File Like Object for Simple IO**

``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:

.. code-block:: python

    # Stream big file by line
    >>> p = S3Path("bucket", "log.txt")
    >>> with p.open("r") as f:
    ...     for line in f:
    ...         do what every you want

    # JSON io
    >>> import json
    >>> p = S3Path("bucket", "config.json")
    >>> with p.open("w") as f:
    ...     json.dump({"password": "mypass"}, f)

    # pandas IO
    >>> import pandas as pd
    >>> p = S3Path("bucket", "dataset.csv")
    >>> df = pd.DataFrame(...)
    >>> with p.open("w") as f:
    ...     df.to_csv(f)

Now that you have a basic understanding of s3pathlib, let's read the `full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_ to explore its capabilities in greater depth.


Getting Help
------------------------------------------------------------------------------
Please use the ``python-s3pathlib`` tag on Stack Overflow to get help.

Submit a ``I want help`` issue tickets on `GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>`_


Contributing
------------------------------------------------------------------------------
Please see the `Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>`_.


Copyright
------------------------------------------------------------------------------
s3pathlib is an open source project. See the `LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>`_ file for more information.


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "s3pathlib",
    "maintainer": "Sanhe Hu",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": "husanhe@email.com",
    "keywords": null,
    "author": "Sanhe Hu",
    "author_email": "husanhe@email.com",
    "download_url": "https://files.pythonhosted.org/packages/21/b6/80ef2d2c25bb341311ea1f8a839cd046a9b0bdcf997e0d96133ec05a0582/s3pathlib-2.3.4.tar.gz",
    "platform": null,
    "description": ".. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest\n    :target: https://s3pathlib.readthedocs.io/en/latest/\n    :alt: Documentation Status\n\n.. image:: https://github.com/MacHu-GWU/s3pathlib-project/actions/workflows/main.yml/badge.svg\n    :target: https://github.com/MacHu-GWU/s3pathlib-project/actions?query=workflow:CI\n\n.. image:: https://codecov.io/gh/MacHu-GWU/s3pathlib-project/branch/main/graph/badge.svg\n    :target: https://codecov.io/gh/MacHu-GWU/s3pathlib-project\n\n.. image:: https://img.shields.io/pypi/v/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/l/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n    \n.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg\n    :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/badge/\u270d\ufe0f_Release_History!--None.svg?style=social&logo=github\n    :target: https://github.com/MacHu-GWU/s3pathlib-project/blob/main/release-history.rst\n\n.. image:: https://img.shields.io/badge/\u2b50_Star_me_on_GitHub!--None.svg?style=social&logo=github\n    :target: https://github.com/aws-samples/s3pathlib-project\n\n------\n\n.. image:: https://img.shields.io/badge/Link-API-blue.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg\n    :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg\n    :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Download-blue.svg\n    :target: https://pypi.org/pypi/s3pathlib#files\n\n\nWelcome to ``s3pathlib`` Documentation\n==============================================================================\n`s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib <https://docs.python.org/3/library/pathlib.html>`_ and is user-friendly. The package also `supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>`_ in AWS S3.\n\n.. note::\n\n    You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_\n\n\nQuick Start\n------------------------------------------------------------------------------\n.. note::\n\n    `COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_\n\n**Import the library, declare an S3Path object**\n\n.. code-block:: python\n\n    # import\n    >>> from s3pathlib import S3Path\n\n    # construct from string, auto join parts\n    >>> p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n    # construct from S3 URI works too\n    >>> p = S3Path(\"s3://bucket/folder/file.txt\")\n    # construct from S3 ARN works too\n    >>> p = S3Path(\"arn:aws:s3:::bucket/folder/file.txt\")\n    >>> p.bucket\n    'bucket'\n    >>> p.key\n    'folder/file.txt'\n    >>> p.uri\n    's3://bucket/folder/file.txt'\n    >>> p.console_url # click to preview it in AWS console\n    'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'\n    >>> p.arn\n    'arn:aws:s3:::bucket/folder/file.txt'\n\n**Talk to AWS S3 and get some information**\n\n.. code-block:: python\n\n    # s3pathlib maintains a \"context\" object that holds the AWS authentication information\n    # you just need to build your own boto session object and attach to it\n    >>> import boto3\n    >>> from s3pathlib import context\n    >>> context.attach_boto_session(\n    ...     boto3.session.Session(\n    ...         region_name=\"us-east-1\",\n    ...         profile_name=\"my_aws_profile\",\n    ...     )\n    ... )\n\n    >>> p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n    >>> p.write_text(\"a lot of data ...\")\n    >>> p.etag\n    '3e20b77868d1a39a587e280b99cec4a8'\n    >>> p.size\n    56789000\n    >>> p.size_for_human\n    '51.16 MB'\n\n    # folder works too, you just need to use a tailing \"/\" to identify that\n    >>> p = S3Path(\"bucket\", \"datalake/\")\n    >>> p.count_objects()\n    7164 # number of files under this prefix\n    >>> p.calculate_total_size()\n    (7164, 236483701963) # 7164 objects, 220.24 GB\n    >>> p.calculate_total_size(for_human=True)\n    (7164, '220.24 GB') # 7164 objects, 220.24 GB\n\n**Manipulate Folder in S3**\n\nNative S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**\n\n.. code-block:: python\n\n    # create a S3 folder\n    >>> p = S3Path(\"bucket\", \"github\", \"repos\", \"my-repo/\")\n\n    # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/\n    >>> p.upload_dir(\"/my-repo\", pattern=\"**/*.py\", overwrite=False)\n\n    # copy entire s3 folder to another s3 folder\n    >>> p2 = S3Path(\"bucket\", \"github\", \"repos\", \"another-repo/\")\n    >>> p1.copy_to(p2, overwrite=True)\n\n    # delete all objects in the folder, recursively, to clean up your test bucket\n    >>> p.delete()\n    >>> p2.delete()\n\n**S3 Path Filter**\n\nEver think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:\n\n.. code-block:: python\n\n    >>> s3bkt = S3Path(\"bucket\") # assume you have a lots of files in this bucket\n    >>> iterproxy = s3bkt.iter_objects().filter(\n    ...     S3Path.size >= 10_000_000, S3Path.ext == \".csv\" # add filter\n    ... )\n\n    >>> iterproxy.one() # fetch one\n    S3Path('s3://bucket/larger-than-10MB-1.csv')\n\n    >>> iterproxy.many(3) # fetch three\n    [\n        S3Path('s3://bucket/larger-than-10MB-1.csv'),\n        S3Path('s3://bucket/larger-than-10MB-2.csv'),\n        S3Path('s3://bucket/larger-than-10MB-3.csv'),\n    ]\n\n    >>> for p in iterproxy: # iter the rest\n    ...     print(p)\n\n\n**File Like Object for Simple IO**\n\n``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:\n\n.. code-block:: python\n\n    # Stream big file by line\n    >>> p = S3Path(\"bucket\", \"log.txt\")\n    >>> with p.open(\"r\") as f:\n    ...     for line in f:\n    ...         do what every you want\n\n    # JSON io\n    >>> import json\n    >>> p = S3Path(\"bucket\", \"config.json\")\n    >>> with p.open(\"w\") as f:\n    ...     json.dump({\"password\": \"mypass\"}, f)\n\n    # pandas IO\n    >>> import pandas as pd\n    >>> p = S3Path(\"bucket\", \"dataset.csv\")\n    >>> df = pd.DataFrame(...)\n    >>> with p.open(\"w\") as f:\n    ...     df.to_csv(f)\n\nNow that you have a basic understanding of s3pathlib, let's read the `full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_ to explore its capabilities in greater depth.\n\n\nGetting Help\n------------------------------------------------------------------------------\nPlease use the ``python-s3pathlib`` tag on Stack Overflow to get help.\n\nSubmit a ``I want help`` issue tickets on `GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>`_\n\n\nContributing\n------------------------------------------------------------------------------\nPlease see the `Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>`_.\n\n\nCopyright\n------------------------------------------------------------------------------\ns3pathlib is an open source project. See the `LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>`_ file for more information.\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "s3pathlib is the python package provides the Pythonic objective oriented programming (OOP) interface to manipulate AWS S3 object / directory. The api is similar to the pathlib standard library and very intuitive for human.",
    "version": "2.3.4",
    "project_urls": {
        "Changelog": "https://github.com/MacHu-GWU/s3pathlib-project/blob/main/release-history.rst",
        "Documentation": "https://s3pathlib.readthedocs.io/en/latest/",
        "Download": "https://pypi.org/pypi/s3pathlib#files",
        "Homepage": "https://github.com/MacHu-GWU/s3pathlib-project",
        "Issues": "https://github.com/MacHu-GWU/s3pathlib-project/issues",
        "Repository": "https://github.com/MacHu-GWU/s3pathlib-project"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "ebf1761248bfa050a4baef47108c892e57dbb33afb3bfdb31fa1eb091eb80d4e",
                "md5": "0d00176bb38179cc56561e42339f96b0",
                "sha256": "11e6b04b94f08baca6da00cba7c428718626362d9d64fbecc2abad97e1d8f9a4"
            },
            "downloads": -1,
            "filename": "s3pathlib-2.3.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "0d00176bb38179cc56561e42339f96b0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 72655,
            "upload_time": "2025-07-13T15:13:57",
            "upload_time_iso_8601": "2025-07-13T15:13:57.560694Z",
            "url": "https://files.pythonhosted.org/packages/eb/f1/761248bfa050a4baef47108c892e57dbb33afb3bfdb31fa1eb091eb80d4e/s3pathlib-2.3.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "21b680ef2d2c25bb341311ea1f8a839cd046a9b0bdcf997e0d96133ec05a0582",
                "md5": "cacd791efd57d41c46b7a62efd70672e",
                "sha256": "7a3e38e29946c776b99beea132ebbe1209653cfb17ea301d485a5738fc533df4"
            },
            "downloads": -1,
            "filename": "s3pathlib-2.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "cacd791efd57d41c46b7a62efd70672e",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 51846,
            "upload_time": "2025-07-13T15:13:58",
            "upload_time_iso_8601": "2025-07-13T15:13:58.957709Z",
            "url": "https://files.pythonhosted.org/packages/21/b6/80ef2d2c25bb341311ea1f8a839cd046a9b0bdcf997e0d96133ec05a0582/s3pathlib-2.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-13 15:13:58",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "MacHu-GWU",
    "github_project": "s3pathlib-project",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [
        {
            "name": "boto-session-manager",
            "specs": [
                [
                    "==",
                    "1.8.1"
                ]
            ]
        },
        {
            "name": "boto3",
            "specs": [
                [
                    "==",
                    "1.38.11"
                ]
            ]
        },
        {
            "name": "botocore",
            "specs": [
                [
                    "==",
                    "1.38.11"
                ]
            ]
        },
        {
            "name": "func-args",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "iterproxy",
            "specs": [
                [
                    "==",
                    "0.3.1"
                ]
            ]
        },
        {
            "name": "jmespath",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pathlib-mate",
            "specs": [
                [
                    "==",
                    "1.3.2"
                ]
            ]
        },
        {
            "name": "python-dateutil",
            "specs": [
                [
                    "==",
                    "2.9.0.post0"
                ]
            ]
        },
        {
            "name": "s3transfer",
            "specs": [
                [
                    "==",
                    "0.12.0"
                ]
            ]
        },
        {
            "name": "six",
            "specs": [
                [
                    "==",
                    "1.17.0"
                ]
            ]
        },
        {
            "name": "smart-open",
            "specs": [
                [
                    "==",
                    "7.1.0"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "1.26.20"
                ]
            ]
        },
        {
            "name": "urllib3",
            "specs": [
                [
                    "==",
                    "2.4.0"
                ]
            ]
        },
        {
            "name": "wrapt",
            "specs": [
                [
                    "==",
                    "1.17.2"
                ]
            ]
        }
    ],
    "lcname": "s3pathlib"
}
        
Elapsed time: 0.80362s