.. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest
:target: https://s3pathlib.readthedocs.io/en/latest/
:alt: Documentation Status
.. image:: https://github.com/aws-samples/s3pathlib-project/workflows/CI/badge.svg
:target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI
.. image:: https://img.shields.io/badge/codecov-100%25-brightgreen
:target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI
.. image:: https://img.shields.io/pypi/v/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/pypi/l/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg
:target: https://pypi.python.org/pypi/s3pathlib
.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social
:target: https://github.com/aws-samples/s3pathlib-project
------
.. image:: https://img.shields.io/badge/Link-Document-orange.svg
:target: https://s3pathlib.readthedocs.io/en/latest/
.. image:: https://img.shields.io/badge/Link-API-blue.svg
:target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html
.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg
:target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html
.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg
:target: https://github.com/aws-samples/s3pathlib-project/issues
.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg
:target: https://github.com/aws-samples/s3pathlib-project/issues
.. image:: https://img.shields.io/badge/Link-Download-blue.svg
:target: https://pypi.org/pypi/s3pathlib#files
Welcome to ``s3pathlib`` Documentation
==============================================================================
`s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib <https://docs.python.org/3/library/pathlib.html>`_ and is user-friendly. The package also `supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>`_ in AWS S3.
.. note::
You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_
Quick Start
------------------------------------------------------------------------------
.. note::
`COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_
**Import the library, declare an S3Path object**
.. code-block:: python
# import
>>> from s3pathlib import S3Path
# construct from string, auto join parts
>>> p = S3Path("bucket", "folder", "file.txt")
# construct from S3 URI works too
>>> p = S3Path("s3://bucket/folder/file.txt")
# construct from S3 ARN works too
>>> p = S3Path("arn:aws:s3:::bucket/folder/file.txt")
>>> p.bucket
'bucket'
>>> p.key
'folder/file.txt'
>>> p.uri
's3://bucket/folder/file.txt'
>>> p.console_url # click to preview it in AWS console
'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'
>>> p.arn
'arn:aws:s3:::bucket/folder/file.txt'
**Talk to AWS S3 and get some information**
.. code-block:: python
# s3pathlib maintains a "context" object that holds the AWS authentication information
# you just need to build your own boto session object and attach to it
>>> import boto3
>>> from s3pathlib import context
>>> context.attach_boto_session(
... boto3.session.Session(
... region_name="us-east-1",
... profile_name="my_aws_profile",
... )
... )
>>> p = S3Path("bucket", "folder", "file.txt")
>>> p.write_text("a lot of data ...")
>>> p.etag
'3e20b77868d1a39a587e280b99cec4a8'
>>> p.size
56789000
>>> p.size_for_human
'51.16 MB'
# folder works too, you just need to use a tailing "/" to identify that
>>> p = S3Path("bucket", "datalake/")
>>> p.count_objects()
7164 # number of files under this prefix
>>> p.calculate_total_size()
(7164, 236483701963) # 7164 objects, 220.24 GB
>>> p.calculate_total_size(for_human=True)
(7164, '220.24 GB') # 7164 objects, 220.24 GB
**Manipulate Folder in S3**
Native S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**
.. code-block:: python
# create a S3 folder
>>> p = S3Path("bucket", "github", "repos", "my-repo/")
# upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/
>>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False)
# copy entire s3 folder to another s3 folder
>>> p2 = S3Path("bucket", "github", "repos", "another-repo/")
>>> p1.copy_to(p2, overwrite=True)
# delete all objects in the folder, recursively, to clean up your test bucket
>>> p.delete()
>>> p2.delete()
**S3 Path Filter**
Ever think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:
.. code-block:: python
>>> s3bkt = S3Path("bucket") # assume you have a lots of files in this bucket
>>> iterproxy = s3bkt.iter_objects().filter(
... S3Path.size >= 10_000_000, S3Path.ext == ".csv" # add filter
... )
>>> iterproxy.one() # fetch one
S3Path('s3://bucket/larger-than-10MB-1.csv')
>>> iterproxy.many(3) # fetch three
[
S3Path('s3://bucket/larger-than-10MB-1.csv'),
S3Path('s3://bucket/larger-than-10MB-2.csv'),
S3Path('s3://bucket/larger-than-10MB-3.csv'),
]
>>> for p in iterproxy: # iter the rest
... print(p)
**File Like Object for Simple IO**
``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:
.. code-block:: python
# Stream big file by line
>>> p = S3Path("bucket", "log.txt")
>>> with p.open("r") as f:
... for line in f:
... do what every you want
# JSON io
>>> import json
>>> p = S3Path("bucket", "config.json")
>>> with p.open("w") as f:
... json.dump({"password": "mypass"}, f)
# pandas IO
>>> import pandas as pd
>>> p = S3Path("bucket", "dataset.csv")
>>> df = pd.DataFrame(...)
>>> with p.open("w") as f:
... df.to_csv(f)
Now that you have a basic understanding of s3pathlib, let's read the `full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_ to explore its capabilities in greater depth.
Getting Help
------------------------------------------------------------------------------
Please use the ``python-s3pathlib`` tag on Stack Overflow to get help.
Submit a ``I want help`` issue tickets on `GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>`_
Contributing
------------------------------------------------------------------------------
Please see the `Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>`_.
Copyright
------------------------------------------------------------------------------
s3pathlib is an open source project. See the `LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>`_ file for more information.
Raw data
{
"_id": null,
"home_page": "https://github.com/aws-samples/s3pathlib-project",
"name": "s3pathlib",
"maintainer": "Sanhe Hu",
"docs_url": null,
"requires_python": ">=3.7",
"maintainer_email": "sanhehu@amazon.com",
"keywords": null,
"author": "Sanhe Hu",
"author_email": "husanhe@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/62/0e/cdcffbc8ebed95a54d39bb26411c0fb64c7daea90223676f862bdb2a1191/s3pathlib-2.3.1.tar.gz",
"platform": "Windows",
"description": ".. image:: https://readthedocs.org/projects/s3pathlib/badge/?version=latest\n :target: https://s3pathlib.readthedocs.io/en/latest/\n :alt: Documentation Status\n\n.. image:: https://github.com/aws-samples/s3pathlib-project/workflows/CI/badge.svg\n :target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI\n\n.. image:: https://img.shields.io/badge/codecov-100%25-brightgreen\n :target: https://github.com/aws-samples/s3pathlib-project/actions?query=workflow:CI\n\n.. image:: https://img.shields.io/pypi/v/s3pathlib.svg\n :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/l/s3pathlib.svg\n :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/pypi/pyversions/s3pathlib.svg\n :target: https://pypi.python.org/pypi/s3pathlib\n \n.. image:: https://img.shields.io/pypi/dm/s3pathlib.svg\n :target: https://pypi.python.org/pypi/s3pathlib\n\n.. image:: https://img.shields.io/badge/STAR_Me_on_GitHub!--None.svg?style=social\n :target: https://github.com/aws-samples/s3pathlib-project\n\n------\n\n.. image:: https://img.shields.io/badge/Link-Document-orange.svg\n :target: https://s3pathlib.readthedocs.io/en/latest/\n\n.. image:: https://img.shields.io/badge/Link-API-blue.svg\n :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Source_Code-blue.svg\n :target: https://s3pathlib.readthedocs.io/en/latest/py-modindex.html\n\n.. image:: https://img.shields.io/badge/Link-Submit_Issue-blue.svg\n :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Request_Feature-blue.svg\n :target: https://github.com/aws-samples/s3pathlib-project/issues\n\n.. image:: https://img.shields.io/badge/Link-Download-blue.svg\n :target: https://pypi.org/pypi/s3pathlib#files\n\n\nWelcome to ``s3pathlib`` Documentation\n==============================================================================\n`s3pathlib <https://s3pathlib.readthedocs.io/en/latest/>`_ is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library `pathlib <https://docs.python.org/3/library/pathlib.html>`_ and is user-friendly. The package also `supports versioning <https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html>`_ in AWS S3.\n\n.. note::\n\n You may not be viewing the full document, `FULL DOCUMENT IS HERE <https://s3pathlib.readthedocs.io/en/latest/>`_\n\n\nQuick Start\n------------------------------------------------------------------------------\n.. note::\n\n `COMPREHENSIVE DOCUMENT guide / features / best practice can be found at HERE <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_\n\n**Import the library, declare an S3Path object**\n\n.. code-block:: python\n\n # import\n >>> from s3pathlib import S3Path\n\n # construct from string, auto join parts\n >>> p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n # construct from S3 URI works too\n >>> p = S3Path(\"s3://bucket/folder/file.txt\")\n # construct from S3 ARN works too\n >>> p = S3Path(\"arn:aws:s3:::bucket/folder/file.txt\")\n >>> p.bucket\n 'bucket'\n >>> p.key\n 'folder/file.txt'\n >>> p.uri\n 's3://bucket/folder/file.txt'\n >>> p.console_url # click to preview it in AWS console\n 'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'\n >>> p.arn\n 'arn:aws:s3:::bucket/folder/file.txt'\n\n**Talk to AWS S3 and get some information**\n\n.. code-block:: python\n\n # s3pathlib maintains a \"context\" object that holds the AWS authentication information\n # you just need to build your own boto session object and attach to it\n >>> import boto3\n >>> from s3pathlib import context\n >>> context.attach_boto_session(\n ... boto3.session.Session(\n ... region_name=\"us-east-1\",\n ... profile_name=\"my_aws_profile\",\n ... )\n ... )\n\n >>> p = S3Path(\"bucket\", \"folder\", \"file.txt\")\n >>> p.write_text(\"a lot of data ...\")\n >>> p.etag\n '3e20b77868d1a39a587e280b99cec4a8'\n >>> p.size\n 56789000\n >>> p.size_for_human\n '51.16 MB'\n\n # folder works too, you just need to use a tailing \"/\" to identify that\n >>> p = S3Path(\"bucket\", \"datalake/\")\n >>> p.count_objects()\n 7164 # number of files under this prefix\n >>> p.calculate_total_size()\n (7164, 236483701963) # 7164 objects, 220.24 GB\n >>> p.calculate_total_size(for_human=True)\n (7164, '220.24 GB') # 7164 objects, 220.24 GB\n\n**Manipulate Folder in S3**\n\nNative S3 Write API (those operation that change the state of S3) only operate on object level. And the `list_objects <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2>`_ API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. ``s3pathlib`` **CAN SAVE YOUR LIFE**\n\n.. code-block:: python\n\n # create a S3 folder\n >>> p = S3Path(\"bucket\", \"github\", \"repos\", \"my-repo/\")\n\n # upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/\n >>> p.upload_dir(\"/my-repo\", pattern=\"**/*.py\", overwrite=False)\n\n # copy entire s3 folder to another s3 folder\n >>> p2 = S3Path(\"bucket\", \"github\", \"repos\", \"another-repo/\")\n >>> p1.copy_to(p2, overwrite=True)\n\n # delete all objects in the folder, recursively, to clean up your test bucket\n >>> p.delete()\n >>> p2.delete()\n\n**S3 Path Filter**\n\nEver think of filter S3 object by it's attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:\n\n.. code-block:: python\n\n >>> s3bkt = S3Path(\"bucket\") # assume you have a lots of files in this bucket\n >>> iterproxy = s3bkt.iter_objects().filter(\n ... S3Path.size >= 10_000_000, S3Path.ext == \".csv\" # add filter\n ... )\n\n >>> iterproxy.one() # fetch one\n S3Path('s3://bucket/larger-than-10MB-1.csv')\n\n >>> iterproxy.many(3) # fetch three\n [\n S3Path('s3://bucket/larger-than-10MB-1.csv'),\n S3Path('s3://bucket/larger-than-10MB-2.csv'),\n S3Path('s3://bucket/larger-than-10MB-3.csv'),\n ]\n\n >>> for p in iterproxy: # iter the rest\n ... print(p)\n\n\n**File Like Object for Simple IO**\n\n``S3Path`` is file-like object. It support ``open`` and context manager syntax out of the box. Here are only some highlight examples:\n\n.. code-block:: python\n\n # Stream big file by line\n >>> p = S3Path(\"bucket\", \"log.txt\")\n >>> with p.open(\"r\") as f:\n ... for line in f:\n ... do what every you want\n\n # JSON io\n >>> import json\n >>> p = S3Path(\"bucket\", \"config.json\")\n >>> with p.open(\"w\") as f:\n ... json.dump({\"password\": \"mypass\"}, f)\n\n # pandas IO\n >>> import pandas as pd\n >>> p = S3Path(\"bucket\", \"dataset.csv\")\n >>> df = pd.DataFrame(...)\n >>> with p.open(\"w\") as f:\n ... df.to_csv(f)\n\nNow that you have a basic understanding of s3pathlib, let's read the `full document <https://s3pathlib.readthedocs.io/en/latest/#comprehensive-guide>`_ to explore its capabilities in greater depth.\n\n\nGetting Help\n------------------------------------------------------------------------------\nPlease use the ``python-s3pathlib`` tag on Stack Overflow to get help.\n\nSubmit a ``I want help`` issue tickets on `GitHub Issues <https://github.com/aws-samples/s3pathlib-project/issues/new/choose>`_\n\n\nContributing\n------------------------------------------------------------------------------\nPlease see the `Contribution Guidelines <https://github.com/aws-samples/s3pathlib-project/blob/main/CONTRIBUTING.rst>`_.\n\n\nCopyright\n------------------------------------------------------------------------------\ns3pathlib is an open source project. See the `LICENSE <https://github.com/aws-samples/s3pathlib-project/blob/main/LICENSE>`_ file for more information.\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Objective Oriented Interface for AWS S3, similar to pathlib.",
"version": "2.3.1",
"project_urls": {
"Download": "https://pypi.python.org/pypi/s3pathlib/2.3.1#downloads",
"Homepage": "https://github.com/aws-samples/s3pathlib-project"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "2de59ba187c818eed81afdecbc3a5903ebb7df5c27ad19c42acd9355cb903fcd",
"md5": "cef9ba3d62af844f9fc923cffb2011a9",
"sha256": "156aa47cf3d41dd622c317a1bb54182b3d83799916d9f088e16557ba28e40496"
},
"downloads": -1,
"filename": "s3pathlib-2.3.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "cef9ba3d62af844f9fc923cffb2011a9",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.7",
"size": 76168,
"upload_time": "2024-08-31T04:48:13",
"upload_time_iso_8601": "2024-08-31T04:48:13.454760Z",
"url": "https://files.pythonhosted.org/packages/2d/e5/9ba187c818eed81afdecbc3a5903ebb7df5c27ad19c42acd9355cb903fcd/s3pathlib-2.3.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "620ecdcffbc8ebed95a54d39bb26411c0fb64c7daea90223676f862bdb2a1191",
"md5": "31ccf64a7841bc4e51c863205c3bea53",
"sha256": "fd0fe06ef0d67e94ee3626d893deb9583a8a3b5a4e8e45fb183cae398bf60b6f"
},
"downloads": -1,
"filename": "s3pathlib-2.3.1.tar.gz",
"has_sig": false,
"md5_digest": "31ccf64a7841bc4e51c863205c3bea53",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.7",
"size": 63088,
"upload_time": "2024-08-31T04:48:15",
"upload_time_iso_8601": "2024-08-31T04:48:15.389401Z",
"url": "https://files.pythonhosted.org/packages/62/0e/cdcffbc8ebed95a54d39bb26411c0fb64c7daea90223676f862bdb2a1191/s3pathlib-2.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-31 04:48:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aws-samples",
"github_project": "s3pathlib-project",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"requirements": [],
"tox": true,
"lcname": "s3pathlib"
}