# Smart Cloud Tag
Automatically tag cloud files across AWS, Azure, and Google Cloud with the least amount of effort.
## Use Case
`smart-cloud-tag` is a multi-cloud tagging solution that automatically applies tags to objects in batch across AWS S3, Azure Blob Storage, and Google Cloud Storage using LLMs (GenAI). It provides end-to-end automation, from reading file content to applying tags. As expected, LLMs do a great job in predicting tags. This tool eradicates the need to manually go through files one by one to add metadata to them, or to build your own custom solution. Now the work you would need to do to tag several objects in the cloud of your choice, would take less time and effort than making your morning cup of coffee.
## Architecture

## Features
- **Multi-Cloud Support**: AWS S3, Azure Blob Storage, Google Cloud Storage
- **AI-Powered**: Uses OpenAI, Anthropic Claude, or Google Gemini for intelligent tagging
- **File Type Support**: Currently supports .txt, .json, .csv, and .md files
- **Simple & Flexible**: Designed to work out-of-the-box while remaining flexible for custom requirements
- **Auto-Detection**: Automatically detects storage provider from URI prefix
- **Batch Processing**: Process multiple files with one command
- **Preview Mode**: Preview tags before applying them (optional)
- **Custom Prompts**: Ability to use your own custom LLM prompt templates (optional)
## Quick Start
### Installation
#### Basic Installation
```bash
pip install smart_cloud_tag
```
> **Note:** Basic installation includes AWS S3 and OpenAI support. For other cloud providers or LLM providers, use the optional dependencies below.
#### Installation with Optional Dependencies
You can install additional dependencies based on your needs:
```bash
# Install with all optional dependencies (recommended)
pip install smart_cloud_tag[all]
# Install with specific cloud providers
pip install smart_cloud_tag[aws] # AWS S3 (included by default)
pip install smart_cloud_tag[azure] # Azure Blob Storage
pip install smart_cloud_tag[gcp] # Google Cloud Storage
# Install with specific LLM providers
pip install smart_cloud_tag[openai] # OpenAI (included by default)
pip install smart_cloud_tag[anthropic] # Anthropic Claude
pip install smart_cloud_tag[gemini] # Google Gemini
# Combine multiple options
pip install smart_cloud_tag[azure,anthropic] # Azure + Anthropic
pip install smart_cloud_tag[gcp,gemini] # GCP + Gemini
```
**Installation Options:**
- `[all]` - Installs all optional dependencies (all cloud providers + LLM providers)
- `[aws]` - AWS S3 support (included by default)
- `[azure]` - Azure Blob Storage support
- `[gcp]` - Google Cloud Storage support
- `[openai]` - OpenAI LLM support (included by default)
- `[anthropic]` - Anthropic Claude LLM support
- `[gemini]` - Google Gemini LLM support
- `[dev]` - Development dependencies (testing, linting, formatting)
### Basic Usage
```python
from smart_cloud_tag import SmartCloudTagger
# Initialize the tagger
tagger = SmartCloudTagger(
storage_uri="az://telehealthcanada", # target bucket location
tags={
"protected_health_information": ["T", "F"], # allowed values are T/F
"document_type": ["chat_transcript", "lab_summary", "claim"],
}, # tag schema
)
# Preview tags before applying (optional)
preview_result = tagger.preview_tags()
print(f"Preview: {preview_result.summary}")
# Apply tags
result = tagger.apply_tags()
print(f"Applied tags to {result.summary['applied']} objects")
```
## Configuration
### Environment Variables
Create a `.env` file:
```bash
# LLM Provider API Key (used for all providers)
API_KEY=your_api_key_here
# AWS (if using S3)
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
# Azure (if using Blob Storage)
AZURE_STORAGE_CONNECTION_STRING=your_connection_string
# Google Cloud (if using GCS)
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
```
### Supported Storage Providers
| Provider | URI Format | Example |
|----------|------------|---------|
| AWS S3 | `s3://bucket` | `s3://my-documents` |
| Azure Blob | `az://container` | `az://documents` |
| Google Cloud | `gs://bucket` | `gs://my-files` |
### Supported LLM Providers
| Provider | Default Model | Environment Variable |
|----------|---------------|---------------------|
| OpenAI | `gpt-5` | `API_KEY` |
| Anthropic | `claude-3-opus-4.1` | `API_KEY` |
| Google Gemini | `gemini-1.5-pro` | `API_KEY` |
## Advanced Usage
### SmartCloudTagger Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `storage_uri` | `str` | Yes | - | Storage location URI (s3://, az://, or gs://) |
| `tags` | `Dict[str, Optional[List[str]]]` | Yes | - | Tag schema with keys and allowed values. If allowed values are missing for a key, LLM will deduce appropriate values |
| `llm_model` | `str` | No | Provider-specific | LLM model to use (see supported models below) |
| `llm_provider` | `str` | No | `"openai"` | LLM provider: "openai", "anthropic", or "gemini" |
| `max_bytes` | `Optional[int]` | No | `5000` | Maximum bytes to read from each object/file |
| `custom_prompt_template` | `Optional[str]` | No | `config.py` | Custom prompt template. Must include placeholders: `{content}`, `{filename}`, `{tags}` (see config.py for default) |
### Default Models by Provider
| Provider | Default Model |
|----------|---------------|
| OpenAI | `gpt-5` |
| Anthropic | `claude-3-opus-4.1` |
| Google Gemini | `gemini-1.5-pro` |
### Different LLM Providers
```python
# Using Anthropic Claude
tagger = SmartCloudTagger(
storage_uri="s3://my-bucket",
tags=tags,
llm_provider="anthropic"
)
# Using Google Gemini
tagger = SmartCloudTagger(
storage_uri="s3://my-bucket",
tags=tags,
llm_provider="gemini"
)
```
## Development
### Installation from Source
```bash
git clone https://github.com/yourusername/smart_cloud_tag.git
cd smart_cloud_tag
pip install -e ".[all]"
```
**Note**: Use quotes around `".[all]"` to prevent shell expansion issues in zsh and other shells.
### Running Tests
```bash
python -m pytest tests/
```
## License
MIT License - see [LICENSE](LICENSE) file for details.
## Support
- 📧 Email: dawarwaqar71@gmail.com
- 🐛 Issues: [GitHub Issues](https://github.com/DawarWaqar/smart_cloud_tag/issues)
Raw data
{
"_id": null,
"home_page": null,
"name": "smart-cloud-tag",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "anthropic, aws, azure, cloud-storage, gcp, gemini, llm, openai, s3, tagging",
"author": null,
"author_email": "Dawar Waqar <dawarwaqar71@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/33/02/ab2287364ddf09924abe943ceecc2e2e50ef5c88d53726dcea6cd21473cd/smart_cloud_tag-0.1.2.tar.gz",
"platform": null,
"description": "# Smart Cloud Tag\n\nAutomatically tag cloud files across AWS, Azure, and Google Cloud with the least amount of effort.\n\n## Use Case\n\n`smart-cloud-tag` is a multi-cloud tagging solution that automatically applies tags to objects in batch across AWS S3, Azure Blob Storage, and Google Cloud Storage using LLMs (GenAI). It provides end-to-end automation, from reading file content to applying tags. As expected, LLMs do a great job in predicting tags. This tool eradicates the need to manually go through files one by one to add metadata to them, or to build your own custom solution. Now the work you would need to do to tag several objects in the cloud of your choice, would take less time and effort than making your morning cup of coffee.\n\n## Architecture\n\n\n\n## Features\n\n- **Multi-Cloud Support**: AWS S3, Azure Blob Storage, Google Cloud Storage\n- **AI-Powered**: Uses OpenAI, Anthropic Claude, or Google Gemini for intelligent tagging\n- **File Type Support**: Currently supports .txt, .json, .csv, and .md files\n- **Simple & Flexible**: Designed to work out-of-the-box while remaining flexible for custom requirements\n- **Auto-Detection**: Automatically detects storage provider from URI prefix\n- **Batch Processing**: Process multiple files with one command\n- **Preview Mode**: Preview tags before applying them (optional)\n- **Custom Prompts**: Ability to use your own custom LLM prompt templates (optional)\n\n## Quick Start\n\n### Installation\n\n#### Basic Installation\n```bash\npip install smart_cloud_tag\n```\n\n> **Note:** Basic installation includes AWS S3 and OpenAI support. For other cloud providers or LLM providers, use the optional dependencies below.\n\n#### Installation with Optional Dependencies\n\nYou can install additional dependencies based on your needs:\n\n```bash\n# Install with all optional dependencies (recommended)\npip install smart_cloud_tag[all]\n\n# Install with specific cloud providers\npip install smart_cloud_tag[aws] # AWS S3 (included by default)\npip install smart_cloud_tag[azure] # Azure Blob Storage\npip install smart_cloud_tag[gcp] # Google Cloud Storage\n\n# Install with specific LLM providers\npip install smart_cloud_tag[openai] # OpenAI (included by default)\npip install smart_cloud_tag[anthropic] # Anthropic Claude\npip install smart_cloud_tag[gemini] # Google Gemini\n\n# Combine multiple options\npip install smart_cloud_tag[azure,anthropic] # Azure + Anthropic\npip install smart_cloud_tag[gcp,gemini] # GCP + Gemini\n```\n\n**Installation Options:**\n- `[all]` - Installs all optional dependencies (all cloud providers + LLM providers)\n- `[aws]` - AWS S3 support (included by default)\n- `[azure]` - Azure Blob Storage support\n- `[gcp]` - Google Cloud Storage support \n- `[openai]` - OpenAI LLM support (included by default)\n- `[anthropic]` - Anthropic Claude LLM support\n- `[gemini]` - Google Gemini LLM support\n- `[dev]` - Development dependencies (testing, linting, formatting)\n\n### Basic Usage\n\n```python\nfrom smart_cloud_tag import SmartCloudTagger\n\n# Initialize the tagger\ntagger = SmartCloudTagger(\n storage_uri=\"az://telehealthcanada\", # target bucket location\n tags={\n \"protected_health_information\": [\"T\", \"F\"], # allowed values are T/F\n \"document_type\": [\"chat_transcript\", \"lab_summary\", \"claim\"],\n }, # tag schema \n)\n\n# Preview tags before applying (optional)\npreview_result = tagger.preview_tags()\nprint(f\"Preview: {preview_result.summary}\")\n\n# Apply tags\nresult = tagger.apply_tags()\nprint(f\"Applied tags to {result.summary['applied']} objects\")\n```\n\n## Configuration\n\n### Environment Variables\n\nCreate a `.env` file:\n\n```bash\n# LLM Provider API Key (used for all providers)\nAPI_KEY=your_api_key_here\n\n# AWS (if using S3)\nAWS_ACCESS_KEY_ID=your_access_key\nAWS_SECRET_ACCESS_KEY=your_secret_key\nAWS_REGION=us-east-1\n\n# Azure (if using Blob Storage)\nAZURE_STORAGE_CONNECTION_STRING=your_connection_string\n\n# Google Cloud (if using GCS)\nGOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json\n```\n\n### Supported Storage Providers\n\n| Provider | URI Format | Example |\n|----------|------------|---------|\n| AWS S3 | `s3://bucket` | `s3://my-documents` |\n| Azure Blob | `az://container` | `az://documents` |\n| Google Cloud | `gs://bucket` | `gs://my-files` |\n\n### Supported LLM Providers\n\n| Provider | Default Model | Environment Variable |\n|----------|---------------|---------------------|\n| OpenAI | `gpt-5` | `API_KEY` |\n| Anthropic | `claude-3-opus-4.1` | `API_KEY` |\n| Google Gemini | `gemini-1.5-pro` | `API_KEY` |\n\n\n\n## Advanced Usage\n\n### SmartCloudTagger Parameters\n\n| Parameter | Type | Required | Default | Description |\n|-----------|------|----------|---------|-------------|\n| `storage_uri` | `str` | Yes | - | Storage location URI (s3://, az://, or gs://) |\n| `tags` | `Dict[str, Optional[List[str]]]` | Yes | - | Tag schema with keys and allowed values. If allowed values are missing for a key, LLM will deduce appropriate values |\n| `llm_model` | `str` | No | Provider-specific | LLM model to use (see supported models below) |\n| `llm_provider` | `str` | No | `\"openai\"` | LLM provider: \"openai\", \"anthropic\", or \"gemini\" |\n| `max_bytes` | `Optional[int]` | No | `5000` | Maximum bytes to read from each object/file |\n| `custom_prompt_template` | `Optional[str]` | No | `config.py` | Custom prompt template. Must include placeholders: `{content}`, `{filename}`, `{tags}` (see config.py for default) |\n\n### Default Models by Provider\n\n| Provider | Default Model |\n|----------|---------------|\n| OpenAI | `gpt-5` |\n| Anthropic | `claude-3-opus-4.1` |\n| Google Gemini | `gemini-1.5-pro` |\n\n### Different LLM Providers\n\n```python\n# Using Anthropic Claude\ntagger = SmartCloudTagger(\n storage_uri=\"s3://my-bucket\",\n tags=tags,\n llm_provider=\"anthropic\"\n)\n\n# Using Google Gemini\ntagger = SmartCloudTagger(\n storage_uri=\"s3://my-bucket\", \n tags=tags,\n llm_provider=\"gemini\"\n)\n```\n\n\n\n\n\n## Development\n\n### Installation from Source\n\n```bash\ngit clone https://github.com/yourusername/smart_cloud_tag.git\ncd smart_cloud_tag\npip install -e \".[all]\"\n```\n\n**Note**: Use quotes around `\".[all]\"` to prevent shell expansion issues in zsh and other shells.\n\n### Running Tests\n\n```bash\npython -m pytest tests/\n```\n\n## License\n\nMIT License - see [LICENSE](LICENSE) file for details.\n\n## Support\n\n- \ud83d\udce7 Email: dawarwaqar71@gmail.com\n- \ud83d\udc1b Issues: [GitHub Issues](https://github.com/DawarWaqar/smart_cloud_tag/issues)",
"bugtrack_url": null,
"license": "MIT",
"summary": "AI-powered cloud storage object tagging using LLMs",
"version": "0.1.2",
"project_urls": {
"Homepage": "https://github.com/DawarWaqar/smart_cloud_tag",
"Issues": "https://github.com/DawarWaqar/smart_cloud_tag/issues",
"Repository": "https://github.com/DawarWaqar/smart_cloud_tag"
},
"split_keywords": [
"anthropic",
" aws",
" azure",
" cloud-storage",
" gcp",
" gemini",
" llm",
" openai",
" s3",
" tagging"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "4c80045d1298d99fafd2e578963e795f6c5067e5bdf09dfe0f3c290f45147465",
"md5": "0146aea843fd0b5b15e634b16695b12a",
"sha256": "1044f94da3f991366b2df85a9f6f37ce510afbd0a7ed055a42e672d6bdc7b1c0"
},
"downloads": -1,
"filename": "smart_cloud_tag-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "0146aea843fd0b5b15e634b16695b12a",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 21587,
"upload_time": "2025-09-03T23:27:31",
"upload_time_iso_8601": "2025-09-03T23:27:31.332758Z",
"url": "https://files.pythonhosted.org/packages/4c/80/045d1298d99fafd2e578963e795f6c5067e5bdf09dfe0f3c290f45147465/smart_cloud_tag-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3302ab2287364ddf09924abe943ceecc2e2e50ef5c88d53726dcea6cd21473cd",
"md5": "bbf6e53409d37df2cd810a089f1d41f8",
"sha256": "5060a98c626fa2d30f53386a5d6f7be1221e6347197543cc93f3f534e07db65b"
},
"downloads": -1,
"filename": "smart_cloud_tag-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "bbf6e53409d37df2cd810a089f1d41f8",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 117954,
"upload_time": "2025-09-03T23:27:32",
"upload_time_iso_8601": "2025-09-03T23:27:32.736599Z",
"url": "https://files.pythonhosted.org/packages/33/02/ab2287364ddf09924abe943ceecc2e2e50ef5c88d53726dcea6cd21473cd/smart_cloud_tag-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-03 23:27:32",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "DawarWaqar",
"github_project": "smart_cloud_tag",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "smart-cloud-tag"
}