pdcloud

Name	pdcloud JSON
Version	0.4.1 JSON
	download
home_page
Summary	Python pandas dataframe cloud agnostic storage
upload_time	2023-12-10 13:32:06
maintainer
docs_url	None
author	Vitali M.
requires_python	>=3.8
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

# pdcloud

`pdcloud` is a Python package designed to simplify and accelerate the onboarding of data stored in cloud environments. It's built to be cloud-agnostic, allowing seamless access to dataframes stored across various cloud platforms.

## Simplifying Cloud Data Access with `pdcloud`

`pdcloud` offers a unified interface to interact with multiple cloud storage providers, abstracting away the complexities of dealing with different cloud-specific APIs. Key advantages include:

- **Cloud-Agnostic Interface**: One interface to access data across Azure, AWS, GCP, and more, removing the need to understand each cloud provider's specifics.
- **Streamlined Data Operations**: Whether reading or writing data, `pdcloud` provides a consistent, intuitive API, simplifying cloud data operations.
- **Optimized Data Handling**: Leveraging PyArrow and Parquet, `pdcloud` ensures efficient, fast, and cost-effective data processing.

Through `pdcloud`, users gain a straightforward, efficient path to access and manipulate cloud-stored data, irrespective of the underlying cloud platform.

## Benefits of Using PyArrow and Parquet

`pdcloud` leverages the power of PyArrow and Parquet for data storage and processing, offering several key advantages:

- **Efficient Data Storage**: Parquet stores data in a columnar format, which is more space-efficient compared to row-based storage, especially for analytical queries.

- **Optimized for Performance**: PyArrow's columnar memory format enables fast data access and efficient in-memory computing, which is crucial for analytics.

- **Cross-platform Support**: Parquet is supported across multiple programming languages and platforms, ensuring compatibility and flexibility.

- **Scalability**: Ideal for handling large datasets, Parquet efficiently scales to accommodate massive volumes of data.

- **Data Compression**: Parquet supports various compression techniques, significantly reducing storage costs and improving I/O performance.

- **Schema Evolution**: Parquet supports schema evolution, allowing modification of the schema over time without the need to rewrite the dataset.

By using PyArrow and Parquet, `pdcloud` ensures that data is stored and accessed in the most efficient, performant, and cost-effective manner.

## Design Choices in `pdcloud`

`pdcloud` is crafted with the vision of simplifying data access across various cloud platforms. Key design choices include:

- **Unified API**: A single, intuitive interface for all cloud storage operations, regardless of the cloud provider.
- **Abstraction Layer**: Abstracts the complexities of each cloud provider's API, providing a seamless experience.
- **Cloud-Agnostic Approach**: Designed to be adaptable to different cloud environments, ensuring flexibility and broad applicability.
- **Optimized Data Processing**: Integration with PyArrow and Parquet for efficient data handling, suitable for both small and large-scale datasets.
- **Focus on Performance and Scalability**: Ensures efficient data operations, catering to the needs of both individual users and large enterprises.

These design choices reflect our commitment to providing a versatile, efficient, and user-friendly tool for cloud-based data management.

## Key Features

- **Cloud Agnostic**: Compatible with major cloud providers, enabling access to data regardless of its cloud location.
- **Efficient Data Onboarding**: Reduces the steps involved in data transfer and processing, moving away from traditional methods like SFTP/FTP.
- **Direct Data Access**: Facilitates direct access to data through simple cloud configurations and connection strings.
- **Standardized Data Format**: Utilizes Parquet format for data storage and retrieval, ensuring efficiency and uniformity.

## Motivation

The goal of `pdcloud` is to revolutionize how data providers share and users access data. By eliminating the cumbersome process of data transfer and storage, `pdcloud` enables users to onboard data swiftly and efficiently. Upon signing necessary data agreements, users can instantly access data provided by vendors through unique cloud configurations, significantly cutting down the time and resources typically spent on data integration.

## Features

- Cloud agnostic: Works with Azure Blob Storage, with planned support for AWS S3 and Google Cloud Storage.
- Asynchronous and synchronous read/write operations.
- Utilizes Apache Arrow for efficient data handling.

## Installation

```bash
pip install pdcloud
```

## Usage

### Azure Storage Adapter

```Python
import pandas as pd

from pdcloud import AzureStorage
from pdcloud import Lib

# Initialize the Azure Storage Adapter
connection_string = ""
azure_storage = AzureStorage(connection_string)

# Define the container name
container_name = "library"

# Create an instance of the Lib class
lib = Lib(container=container_name, storage=azure_storage)

# Read and process all data objects from the container
all_data: pd.DataFrame = lib.read_all()
print("All Data:", all_data)

# Read and process a specific data object from the container
data_object_name = "mydata"
specific_data: pd.DataFrame = lib.read(data_object_name)
print("Specific Data Object:", specific_data)

# Write a DataFrame to the same Container
lib.write("mydata", data=df, overwrite=True)

# Write a DataFrame to a different container
lib.write("mydata", container="library", data=df, overwrite=True)
```

## Contributing

Contributions to pdcloud are welcome! Please read our contributing guidelines for details on how to contribute to the project.

## License

This project is licensed under the MIT License.

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "pdcloud",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": "",
    "keywords": "",
    "author": "Vitali M.",
    "author_email": "pilot@aviatorv.com",
    "download_url": "https://files.pythonhosted.org/packages/18/be/cdb37e0762435c13894260f9a39b9f9c58665df1131df739dab1a59c7fa9/pdcloud-0.4.1.tar.gz",
    "platform": null,
    "description": "# pdcloud\r\n\r\n`pdcloud` is a Python package designed to simplify and accelerate the onboarding of data stored in cloud environments. It's built to be cloud-agnostic, allowing seamless access to dataframes stored across various cloud platforms.\r\n\r\n## Simplifying Cloud Data Access with `pdcloud`\r\n\r\n`pdcloud` offers a unified interface to interact with multiple cloud storage providers, abstracting away the complexities of dealing with different cloud-specific APIs. Key advantages include:\r\n\r\n- **Cloud-Agnostic Interface**: One interface to access data across Azure, AWS, GCP, and more, removing the need to understand each cloud provider's specifics.\r\n- **Streamlined Data Operations**: Whether reading or writing data, `pdcloud` provides a consistent, intuitive API, simplifying cloud data operations.\r\n- **Optimized Data Handling**: Leveraging PyArrow and Parquet, `pdcloud` ensures efficient, fast, and cost-effective data processing.\r\n\r\nThrough `pdcloud`, users gain a straightforward, efficient path to access and manipulate cloud-stored data, irrespective of the underlying cloud platform.\r\n\r\n## Benefits of Using PyArrow and Parquet\r\n\r\n`pdcloud` leverages the power of PyArrow and Parquet for data storage and processing, offering several key advantages:\r\n\r\n- **Efficient Data Storage**: Parquet stores data in a columnar format, which is more space-efficient compared to row-based storage, especially for analytical queries.\r\n\r\n- **Optimized for Performance**: PyArrow's columnar memory format enables fast data access and efficient in-memory computing, which is crucial for analytics.\r\n\r\n- **Cross-platform Support**: Parquet is supported across multiple programming languages and platforms, ensuring compatibility and flexibility.\r\n\r\n- **Scalability**: Ideal for handling large datasets, Parquet efficiently scales to accommodate massive volumes of data.\r\n\r\n- **Data Compression**: Parquet supports various compression techniques, significantly reducing storage costs and improving I/O performance.\r\n\r\n- **Schema Evolution**: Parquet supports schema evolution, allowing modification of the schema over time without the need to rewrite the dataset.\r\n\r\nBy using PyArrow and Parquet, `pdcloud` ensures that data is stored and accessed in the most efficient, performant, and cost-effective manner.\r\n\r\n## Design Choices in `pdcloud`\r\n\r\n`pdcloud` is crafted with the vision of simplifying data access across various cloud platforms. Key design choices include:\r\n\r\n- **Unified API**: A single, intuitive interface for all cloud storage operations, regardless of the cloud provider.\r\n- **Abstraction Layer**: Abstracts the complexities of each cloud provider's API, providing a seamless experience.\r\n- **Cloud-Agnostic Approach**: Designed to be adaptable to different cloud environments, ensuring flexibility and broad applicability.\r\n- **Optimized Data Processing**: Integration with PyArrow and Parquet for efficient data handling, suitable for both small and large-scale datasets.\r\n- **Focus on Performance and Scalability**: Ensures efficient data operations, catering to the needs of both individual users and large enterprises.\r\n\r\nThese design choices reflect our commitment to providing a versatile, efficient, and user-friendly tool for cloud-based data management.\r\n\r\n## Key Features\r\n\r\n- **Cloud Agnostic**: Compatible with major cloud providers, enabling access to data regardless of its cloud location.\r\n- **Efficient Data Onboarding**: Reduces the steps involved in data transfer and processing, moving away from traditional methods like SFTP/FTP.\r\n- **Direct Data Access**: Facilitates direct access to data through simple cloud configurations and connection strings.\r\n- **Standardized Data Format**: Utilizes Parquet format for data storage and retrieval, ensuring efficiency and uniformity.\r\n\r\n## Motivation\r\n\r\nThe goal of `pdcloud` is to revolutionize how data providers share and users access data. By eliminating the cumbersome process of data transfer and storage, `pdcloud` enables users to onboard data swiftly and efficiently. Upon signing necessary data agreements, users can instantly access data provided by vendors through unique cloud configurations, significantly cutting down the time and resources typically spent on data integration.\r\n\r\n## Features\r\n\r\n- Cloud agnostic: Works with Azure Blob Storage, with planned support for AWS S3 and Google Cloud Storage.\r\n- Asynchronous and synchronous read/write operations.\r\n- Utilizes Apache Arrow for efficient data handling.\r\n\r\n## Installation\r\n\r\n```bash\r\npip install pdcloud\r\n```\r\n\r\n## Usage\r\n\r\n### Azure Storage Adapter\r\n\r\n```Python\r\nimport pandas as pd\r\n\r\nfrom pdcloud import AzureStorage\r\nfrom pdcloud import Lib\r\n\r\n# Initialize the Azure Storage Adapter\r\nconnection_string = \"\"\r\nazure_storage = AzureStorage(connection_string)\r\n\r\n# Define the container name\r\ncontainer_name = \"library\"\r\n\r\n# Create an instance of the Lib class\r\nlib = Lib(container=container_name, storage=azure_storage)\r\n\r\n\r\n# Read and process all data objects from the container\r\nall_data: pd.DataFrame = lib.read_all()\r\nprint(\"All Data:\", all_data)\r\n\r\n# Read and process a specific data object from the container\r\ndata_object_name = \"mydata\"\r\nspecific_data: pd.DataFrame = lib.read(data_object_name)\r\nprint(\"Specific Data Object:\", specific_data)\r\n\r\n# Write a DataFrame to the same Container\r\nlib.write(\"mydata\", data=df, overwrite=True)\r\n\r\n# Write a DataFrame to a different container\r\nlib.write(\"mydata\", container=\"library\", data=df, overwrite=True)\r\n```\r\n\r\n## Contributing\r\n\r\nContributions to pdcloud are welcome! Please read our contributing guidelines for details on how to contribute to the project.\r\n\r\n## License\r\n\r\nThis project is licensed under the MIT License.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Python pandas dataframe cloud agnostic storage",
    "version": "0.4.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dbe65a4f3ae41cb2b06426652d4637db4514a75e1a768fa08b6dcd4ac902334e",
                "md5": "7ff8c64a287c77576c4bd4ca3f291ddb",
                "sha256": "e8f2dd4e4380c567f3905e59cae25e3bc269a515fbf001da3d37b5a633f41a4f"
            },
            "downloads": -1,
            "filename": "pdcloud-0.4.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7ff8c64a287c77576c4bd4ca3f291ddb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 12883,
            "upload_time": "2023-12-10T13:32:04",
            "upload_time_iso_8601": "2023-12-10T13:32:04.788375Z",
            "url": "https://files.pythonhosted.org/packages/db/e6/5a4f3ae41cb2b06426652d4637db4514a75e1a768fa08b6dcd4ac902334e/pdcloud-0.4.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "18becdb37e0762435c13894260f9a39b9f9c58665df1131df739dab1a59c7fa9",
                "md5": "0d7742d23dce00515bf0f869da8ea806",
                "sha256": "f7e8fb43f8d0d0eb723d3a3b0af03bfdd11cb9fab443c6b018e1984f7a099abc"
            },
            "downloads": -1,
            "filename": "pdcloud-0.4.1.tar.gz",
            "has_sig": false,
            "md5_digest": "0d7742d23dce00515bf0f869da8ea806",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 12526,
            "upload_time": "2023-12-10T13:32:06",
            "upload_time_iso_8601": "2023-12-10T13:32:06.712933Z",
            "url": "https://files.pythonhosted.org/packages/18/be/cdb37e0762435c13894260f9a39b9f9c58665df1131df739dab1a59c7fa9/pdcloud-0.4.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-12-10 13:32:06",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "pdcloud"
}

Vitali M.