azurebatchload


Nameazurebatchload JSON
Version 0.6.3 PyPI version JSON
download
home_pagehttps://github.com/zypp-io/azure-batch-load
SummaryDownload and upload files in batches from Azure Blob Storage Containers
upload_time2023-06-12 07:56:17
maintainer
docs_urlNone
authorErfan Nariman, Melvin Folkers
requires_python>=3.7
license
keywords python azure blob download upload batch
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <p align="center">
  <img alt="logo" src="https://www.zypp.io/static/assets/img/logos/zypp/white/500px.png"  width="200"/>
</p>

[![Downloads](https://pepy.tech/badge/azurebatchload)](https://pepy.tech/project/azurebatchload)
[![PyPi](https://img.shields.io/pypi/v/azurebatchload.svg)](https://pypi.python.org/pypi/azurebatchload)
[![Open Source](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://opensource.org/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

# Azure Batch Load
High level Python wrapper for the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/) to download or upload files in batches from or to Azure Blob Storage Containers.
This project aims to be the [missing functionality](https://github.com/Azure/azure-storage-python/issues/554)
in the Python SDK of Azure Storage since there is no possibility to download or upload batches of files from or to containers.
The only option in the Azure Storage Python SDK is downloading file by file, which takes a lot of time.

Besides doing loads in batches, since version `0.0.5` it's possible to set method to `single` which will use the
[Azure Python SDK](https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob) to process files one by one.


# Installation

```commandline
pip install azurebatchload
```

See [PyPi](https://pypi.org/project/azurebatchload/) for package index.

**Note**: For batch uploads (`method="batch"`) Azure CLI has to be [installed](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
and [configured](https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli).
Check if Azure CLI is installed through terminal:

```commandline
az --version
```

# Requirements

Azure Storage connection string has to be set as environment variable `AZURE_STORAGE_CONNECTION_STRING` or
the seperate environment variables `AZURE_STORAGE_KEY` and `AZURE_STORAGE_NAME` which will be used to create the connection string.

# Usage

## Download
### 1. Using the standard environment variables

Azure-batch-load automatically checks for environment variables: `AZURE_STORAGE_CONNECTION_STRING`,
   `AZURE_STORAGE_KEY`and `AZURE_STORAGE_ACCOUNT`.
So if the connection_string or storage_key + storage_account are set as environment variables,
   we can leave the argument `connection_string`, `account_key` and `account_name` empty:

```python
from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   extension='.pdf'
).download()
```

### 2. Using `method="single"`

We can make skip the usage of the `Azure CLI` and just make use Python SDK by setting the `method="single"`:

```python
from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   extension='.pdf',
   method='single'
).download()
```

### 3. Download a specific folder from a container

We can download a folder by setting the `folder` argument. This works both for `single` and `batch`.

```python
from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   folder='uploads/invoices/',
   extension='.pdf',
   method='single'
).download()
```

### 4. Download a given list of files

We can give a list of files to download with the `list_files` argument.
Note, this only works with `method='single'`.

```python
from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   folder='uploads/invoices/',
   list_files=["invoice1.pdf", "invoice2.pdf"],
   method='single'
).download()
```

## Upload:

### 1. Using the standard environment variables

```python
from azurebatchload import Upload

Upload(
   destination='blobcontainername',
   source='../pdf',
   extension='*.pdf'
).upload()
```

### 2. Using the `method="single"` method which does not require Azure CLI.

```python
from azurebatchload import Upload

Upload(
   destination='blobcontainername',
   source='../pdf',
   extension='*.pdf',
   method="single"
).upload()
```

### 3. Upload a given list of files with the `list_files` argument.

```python
from azurebatchload import Upload

Upload(
   destination='blobcontainername',
   source='../pdf',
   list_files=["invoice1.pdf", "invoice2.pdf"],
   method="single"
).upload()
```

## List blobs

With the `Utils.list_blobs` method we can do advanced listing of blobs in a container or specific folder in a container.
We have several argument we can use to define our scope of information:

- `name_starts_with`: This can be used to filter files with certain prefix, or to select certain folders: `name_starts_with=folder1/subfolder/lastfolder/`
- `dataframe`: Define if you want a pandas dataframe object returned for your information.
- `extended_info`: Get just the blob names or more extended information like size, creation date, modified date.

### 1. List a whole container with just the filenames as a list.
```python
from azurebatchload import Utils

list_blobs = Utils(container='containername').list_blobs()
```

### 2. List a whole container with just the filenames as a dataframe.
```python
from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   dataframe=True
).list_blobs()
```

### 3. List a folder in a container.
```python
from azurebatchload import Utils

list_blobs = Utils(
   container='containername',
   name_starts_with="foldername/"
).list_blobs()
```

### 4. Get extended information a folder.
```python
from azurebatchload import Utils

dict_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True
).list_blobs()
```

### 5. Get extended information a folder returned as a pandas dataframe.
```python
from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True,
   dataframe=True
).list_blobs()
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/zypp-io/azure-batch-load",
    "name": "azurebatchload",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "python,azure,blob,download,upload,batch",
    "author": "Erfan Nariman, Melvin Folkers",
    "author_email": "hello@zypp.io",
    "download_url": "https://files.pythonhosted.org/packages/19/b7/bf424b1c842a121adf3e5d39a85e45f5605c2ad7ea375c506ce199fbeeee/azurebatchload-0.6.3.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img alt=\"logo\" src=\"https://www.zypp.io/static/assets/img/logos/zypp/white/500px.png\"  width=\"200\"/>\n</p>\n\n[![Downloads](https://pepy.tech/badge/azurebatchload)](https://pepy.tech/project/azurebatchload)\n[![PyPi](https://img.shields.io/pypi/v/azurebatchload.svg)](https://pypi.python.org/pypi/azurebatchload)\n[![Open Source](https://badges.frapsoft.com/os/v1/open-source.svg?v=103)](https://opensource.org/)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n\n# Azure Batch Load\nHigh level Python wrapper for the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/) to download or upload files in batches from or to Azure Blob Storage Containers.\nThis project aims to be the [missing functionality](https://github.com/Azure/azure-storage-python/issues/554)\nin the Python SDK of Azure Storage since there is no possibility to download or upload batches of files from or to containers.\nThe only option in the Azure Storage Python SDK is downloading file by file, which takes a lot of time.\n\nBesides doing loads in batches, since version `0.0.5` it's possible to set method to `single` which will use the\n[Azure Python SDK](https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob) to process files one by one.\n\n\n# Installation\n\n```commandline\npip install azurebatchload\n```\n\nSee [PyPi](https://pypi.org/project/azurebatchload/) for package index.\n\n**Note**: For batch uploads (`method=\"batch\"`) Azure CLI has to be [installed](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)\nand [configured](https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli).\nCheck if Azure CLI is installed through terminal:\n\n```commandline\naz --version\n```\n\n# Requirements\n\nAzure Storage connection string has to be set as environment variable `AZURE_STORAGE_CONNECTION_STRING` or\nthe seperate environment variables `AZURE_STORAGE_KEY` and `AZURE_STORAGE_NAME` which will be used to create the connection string.\n\n# Usage\n\n## Download\n### 1. Using the standard environment variables\n\nAzure-batch-load automatically checks for environment variables: `AZURE_STORAGE_CONNECTION_STRING`,\n   `AZURE_STORAGE_KEY`and `AZURE_STORAGE_ACCOUNT`.\nSo if the connection_string or storage_key + storage_account are set as environment variables,\n   we can leave the argument `connection_string`, `account_key` and `account_name` empty:\n\n```python\nfrom azurebatchload import Download\n\nDownload(\n   destination='../pdfs',\n   source='blobcontainername',\n   extension='.pdf'\n).download()\n```\n\n### 2. Using `method=\"single\"`\n\nWe can make skip the usage of the `Azure CLI` and just make use Python SDK by setting the `method=\"single\"`:\n\n```python\nfrom azurebatchload import Download\n\nDownload(\n   destination='../pdfs',\n   source='blobcontainername',\n   extension='.pdf',\n   method='single'\n).download()\n```\n\n### 3. Download a specific folder from a container\n\nWe can download a folder by setting the `folder` argument. This works both for `single` and `batch`.\n\n```python\nfrom azurebatchload import Download\n\nDownload(\n   destination='../pdfs',\n   source='blobcontainername',\n   folder='uploads/invoices/',\n   extension='.pdf',\n   method='single'\n).download()\n```\n\n### 4. Download a given list of files\n\nWe can give a list of files to download with the `list_files` argument.\nNote, this only works with `method='single'`.\n\n```python\nfrom azurebatchload import Download\n\nDownload(\n   destination='../pdfs',\n   source='blobcontainername',\n   folder='uploads/invoices/',\n   list_files=[\"invoice1.pdf\", \"invoice2.pdf\"],\n   method='single'\n).download()\n```\n\n## Upload:\n\n### 1. Using the standard environment variables\n\n```python\nfrom azurebatchload import Upload\n\nUpload(\n   destination='blobcontainername',\n   source='../pdf',\n   extension='*.pdf'\n).upload()\n```\n\n### 2. Using the `method=\"single\"` method which does not require Azure CLI.\n\n```python\nfrom azurebatchload import Upload\n\nUpload(\n   destination='blobcontainername',\n   source='../pdf',\n   extension='*.pdf',\n   method=\"single\"\n).upload()\n```\n\n### 3. Upload a given list of files with the `list_files` argument.\n\n```python\nfrom azurebatchload import Upload\n\nUpload(\n   destination='blobcontainername',\n   source='../pdf',\n   list_files=[\"invoice1.pdf\", \"invoice2.pdf\"],\n   method=\"single\"\n).upload()\n```\n\n## List blobs\n\nWith the `Utils.list_blobs` method we can do advanced listing of blobs in a container or specific folder in a container.\nWe have several argument we can use to define our scope of information:\n\n- `name_starts_with`: This can be used to filter files with certain prefix, or to select certain folders: `name_starts_with=folder1/subfolder/lastfolder/`\n- `dataframe`: Define if you want a pandas dataframe object returned for your information.\n- `extended_info`: Get just the blob names or more extended information like size, creation date, modified date.\n\n### 1. List a whole container with just the filenames as a list.\n```python\nfrom azurebatchload import Utils\n\nlist_blobs = Utils(container='containername').list_blobs()\n```\n\n### 2. List a whole container with just the filenames as a dataframe.\n```python\nfrom azurebatchload import Utils\n\ndf_blobs = Utils(\n   container='containername',\n   dataframe=True\n).list_blobs()\n```\n\n### 3. List a folder in a container.\n```python\nfrom azurebatchload import Utils\n\nlist_blobs = Utils(\n   container='containername',\n   name_starts_with=\"foldername/\"\n).list_blobs()\n```\n\n### 4. Get extended information a folder.\n```python\nfrom azurebatchload import Utils\n\ndict_blobs = Utils(\n   container='containername',\n   name_starts_with=\"foldername/\",\n   extended_info=True\n).list_blobs()\n```\n\n### 5. Get extended information a folder returned as a pandas dataframe.\n```python\nfrom azurebatchload import Utils\n\ndf_blobs = Utils(\n   container='containername',\n   name_starts_with=\"foldername/\",\n   extended_info=True,\n   dataframe=True\n).list_blobs()\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Download and upload files in batches from Azure Blob Storage Containers",
    "version": "0.6.3",
    "project_urls": {
        "Bug Tracker": "https://github.com/zypp-io/azure-batch-load/issues",
        "Homepage": "https://github.com/zypp-io/azure-batch-load",
        "Source": "https://github.com/zypp-io/azure-batch-load"
    },
    "split_keywords": [
        "python",
        "azure",
        "blob",
        "download",
        "upload",
        "batch"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e1bff53cf491f1317eb8430a29a041cbc1021c2fd6cd34b4c0d336a36810eaab",
                "md5": "c5b2892b6a722aa0d7219c72c15e8bd3",
                "sha256": "824984ae14eef6f4e5cc0d2e82b99bc8e4a72bb46fad708516b7d834d1c300d7"
            },
            "downloads": -1,
            "filename": "azurebatchload-0.6.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c5b2892b6a722aa0d7219c72c15e8bd3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 13359,
            "upload_time": "2023-06-12T07:56:16",
            "upload_time_iso_8601": "2023-06-12T07:56:16.485149Z",
            "url": "https://files.pythonhosted.org/packages/e1/bf/f53cf491f1317eb8430a29a041cbc1021c2fd6cd34b4c0d336a36810eaab/azurebatchload-0.6.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "19b7bf424b1c842a121adf3e5d39a85e45f5605c2ad7ea375c506ce199fbeeee",
                "md5": "e889867418c07f42f93116e0d5d33e05",
                "sha256": "34ca26921bd6d1ad9694ef9028e698e10bad2f1b1833e1bca409d465bd1d12fb"
            },
            "downloads": -1,
            "filename": "azurebatchload-0.6.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e889867418c07f42f93116e0d5d33e05",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 13139,
            "upload_time": "2023-06-12T07:56:17",
            "upload_time_iso_8601": "2023-06-12T07:56:17.733818Z",
            "url": "https://files.pythonhosted.org/packages/19/b7/bf424b1c842a121adf3e5d39a85e45f5605c2ad7ea375c506ce199fbeeee/azurebatchload-0.6.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-06-12 07:56:17",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "zypp-io",
    "github_project": "azure-batch-load",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "azurebatchload"
}
        
Elapsed time: 0.07882s