Name | pdtrain JSON |
Version |
0.1.0
JSON |
| download |
home_page | None |
Summary | Pipedream Training Orchestrator CLI - Train ML models on AWS SageMaker |
upload_time | 2025-10-07 14:48:06 |
maintainer | None |
docs_url | None |
author | None |
requires_python | >=3.8 |
license | MIT |
keywords |
ml
training
sagemaker
pipedream
cli
|
VCS |
 |
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# pdtrain
**Pipedream Training Orchestrator CLI** - Train ML models on AWS SageMaker with ease.
## Installation
```bash
pip install pdtrain
```
## Quick Start
### 1. Configure
```bash
pdtrain configure
```
This will prompt you for:
- API URL (default: `http://localhost:8000`)
- API Key (from Pipedream dashboard)
### 2. Upload Training Code
```bash
# Upload a directory (will auto-create tar.gz)
pdtrain bundle upload ./my-training-code --wait
# Or upload existing tar.gz file
pdtrain bundle upload ./my-training-code.tar.gz --wait
```
### 3. Upload Dataset
```bash
pdtrain dataset upload ./data.csv --name "train-data" --wait
```
### 4. Create and Run Training
```bash
# Get bundle and dataset IDs first
pdtrain bundle list
pdtrain dataset list
# Create run using IDs
pdtrain run create \
--bundle ca8912d6-79a4-4ea9-8570-234ec1baeef1 \
--dataset ds_abc123-def456-7890 \
--framework pytorch \
--entry train.py \
--submit --wait
```
### 5. View Results
```bash
# View logs
pdtrain logs run-abc123
# List artifacts
pdtrain artifacts list run-abc123
# Download artifacts
pdtrain artifacts download run-abc123 --output ./results/
```
## Commands
### Bundle Management
```bash
# Upload directory (auto-creates tar.gz)
pdtrain bundle upload ./my-training-code --wait
# Upload existing tar.gz
pdtrain bundle upload ./code.tar.gz --name "my-model" --wait
# List bundles
pdtrain bundle list
# Show bundle details
pdtrain bundle show abc-123
```
### Dataset Management
```bash
# Upload dataset
pdtrain dataset upload ./data.csv --name "train-data" --wait
# List datasets
pdtrain dataset list
# Download dataset
pdtrain dataset download ds-456 --version 1 --output ./data/
```
### Training Runs
```bash
# Create run
pdtrain run create \
--bundle my-model:v1.0.0 \
--dataset train-data:1 \
--framework pytorch \
--entry train.py
# Submit run
pdtrain run submit run-abc123
# List runs
pdtrain run list
# Show run details
pdtrain run show run-abc123
# Watch run progress
pdtrain run watch run-abc123
# Stop run
pdtrain run stop run-abc123
```
### Logs & Artifacts
```bash
# View logs (last 300 lines by default)
pdtrain logs run-abc123 --lines 500
# Follow logs in real-time
pdtrain logs run-abc123 --follow --interval 5
# List artifacts
pdtrain artifacts list run-abc123
# Download artifacts (defaults to ./artifacts/<run_id>/)
pdtrain artifacts download run-abc123 --output ./results/
```
### Quota
```bash
# Check storage quota
pdtrain quota
```
## Configuration
Configuration is stored in `~/.pdtrain/config.json`.
You can also use environment variables:
```bash
export PDTRAIN_API_URL=https://ml-orchestrator.pipedream.in
export PDTRAIN_API_KEY=sdk_xxxxx
```
## Examples
### Complete Workflow
```bash
# 1. Upload code (from directory)
pdtrain bundle upload ./resnet-training --wait
# 2. Upload dataset
pdtrain dataset upload ./cifar10.csv --name "cifar10" --wait
# 3. Get IDs
pdtrain bundle list # Copy bundle ID
pdtrain dataset list # Copy dataset ID
# 4. Run training (use IDs from step 3)
pdtrain run create \
--bundle <bundle-id> \
--dataset <dataset-id> \
--framework pytorch \
--entry train.py \
--submit --wait
# 5. Download results
pdtrain artifacts download <run-id> --output ./results/
```
### Docker Mode
```bash
pdtrain run create \
--bundle my-code:latest \
--image 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-cpu \
--entry train.py \
--submit
```
### Script Mode with Hyperparameters
```bash
pdtrain run create \
--bundle my-code:latest \
--framework pytorch \
--framework-version 2.2.0 \
--hyperparameter epochs=10 \
--hyperparameter batch_size=32 \
--submit
```
## Development
```bash
# Clone repository
git clone https://github.com/pipedream/pdtrain
cd pdtrain
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black pdtrain/
```
## License
MIT License - see LICENSE file for details.
## Support
- API KEYS:https://pipedream.in/api-keys
- Issues: https://github.com/pipedream/pdtrain/issues
- Email: hello@pipedream.in
Raw data
{
"_id": null,
"home_page": null,
"name": "pdtrain",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "ml, training, sagemaker, pipedream, cli",
"author": null,
"author_email": "Pipedream <support@pipedream.ai>",
"download_url": "https://files.pythonhosted.org/packages/54/f2/b7fed1f3c1d18988ecca51d1027ceed3e8389139926bbbc104967e0a1125/pdtrain-0.1.0.tar.gz",
"platform": null,
"description": "# pdtrain\n\n**Pipedream Training Orchestrator CLI** - Train ML models on AWS SageMaker with ease.\n\n## Installation\n\n```bash\npip install pdtrain\n```\n\n## Quick Start\n\n### 1. Configure\n\n```bash\npdtrain configure\n```\n\nThis will prompt you for:\n- API URL (default: `http://localhost:8000`)\n- API Key (from Pipedream dashboard)\n\n### 2. Upload Training Code\n\n```bash\n# Upload a directory (will auto-create tar.gz)\npdtrain bundle upload ./my-training-code --wait\n\n# Or upload existing tar.gz file\npdtrain bundle upload ./my-training-code.tar.gz --wait\n```\n\n### 3. Upload Dataset\n\n```bash\npdtrain dataset upload ./data.csv --name \"train-data\" --wait\n```\n\n### 4. Create and Run Training\n\n```bash\n# Get bundle and dataset IDs first\npdtrain bundle list\npdtrain dataset list\n\n# Create run using IDs\npdtrain run create \\\n --bundle ca8912d6-79a4-4ea9-8570-234ec1baeef1 \\\n --dataset ds_abc123-def456-7890 \\\n --framework pytorch \\\n --entry train.py \\\n --submit --wait\n```\n\n### 5. View Results\n\n```bash\n# View logs\npdtrain logs run-abc123\n\n# List artifacts\npdtrain artifacts list run-abc123\n\n# Download artifacts\npdtrain artifacts download run-abc123 --output ./results/\n```\n\n## Commands\n\n### Bundle Management\n\n```bash\n# Upload directory (auto-creates tar.gz)\npdtrain bundle upload ./my-training-code --wait\n\n# Upload existing tar.gz\npdtrain bundle upload ./code.tar.gz --name \"my-model\" --wait\n\n# List bundles\npdtrain bundle list\n\n# Show bundle details\npdtrain bundle show abc-123\n```\n\n### Dataset Management\n\n```bash\n# Upload dataset\npdtrain dataset upload ./data.csv --name \"train-data\" --wait\n\n# List datasets\npdtrain dataset list\n\n# Download dataset\npdtrain dataset download ds-456 --version 1 --output ./data/\n```\n\n### Training Runs\n\n```bash\n# Create run\npdtrain run create \\\n --bundle my-model:v1.0.0 \\\n --dataset train-data:1 \\\n --framework pytorch \\\n --entry train.py\n\n# Submit run\npdtrain run submit run-abc123\n\n# List runs\npdtrain run list\n\n# Show run details\npdtrain run show run-abc123\n\n# Watch run progress\npdtrain run watch run-abc123\n\n# Stop run\npdtrain run stop run-abc123\n```\n\n### Logs & Artifacts\n\n```bash\n# View logs (last 300 lines by default)\npdtrain logs run-abc123 --lines 500\n\n# Follow logs in real-time\npdtrain logs run-abc123 --follow --interval 5\n\n# List artifacts\npdtrain artifacts list run-abc123\n\n# Download artifacts (defaults to ./artifacts/<run_id>/)\npdtrain artifacts download run-abc123 --output ./results/\n```\n\n### Quota\n\n```bash\n# Check storage quota\npdtrain quota\n```\n\n## Configuration\n\nConfiguration is stored in `~/.pdtrain/config.json`.\n\nYou can also use environment variables:\n```bash\nexport PDTRAIN_API_URL=https://ml-orchestrator.pipedream.in\nexport PDTRAIN_API_KEY=sdk_xxxxx\n```\n\n## Examples\n\n### Complete Workflow\n\n```bash\n# 1. Upload code (from directory)\npdtrain bundle upload ./resnet-training --wait\n\n# 2. Upload dataset\npdtrain dataset upload ./cifar10.csv --name \"cifar10\" --wait\n\n# 3. Get IDs\npdtrain bundle list # Copy bundle ID\npdtrain dataset list # Copy dataset ID\n\n# 4. Run training (use IDs from step 3)\npdtrain run create \\\n --bundle <bundle-id> \\\n --dataset <dataset-id> \\\n --framework pytorch \\\n --entry train.py \\\n --submit --wait\n\n# 5. Download results\npdtrain artifacts download <run-id> --output ./results/\n```\n\n### Docker Mode\n\n```bash\npdtrain run create \\\n --bundle my-code:latest \\\n --image 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.2.0-cpu \\\n --entry train.py \\\n --submit\n```\n\n### Script Mode with Hyperparameters\n\n```bash\npdtrain run create \\\n --bundle my-code:latest \\\n --framework pytorch \\\n --framework-version 2.2.0 \\\n --hyperparameter epochs=10 \\\n --hyperparameter batch_size=32 \\\n --submit\n```\n\n## Development\n\n```bash\n# Clone repository\ngit clone https://github.com/pipedream/pdtrain\ncd pdtrain\n\n# Install in development mode\npip install -e \".[dev]\"\n\n# Run tests\npytest\n\n# Format code\nblack pdtrain/\n```\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## Support\n\n- API KEYS:https://pipedream.in/api-keys\n- Issues: https://github.com/pipedream/pdtrain/issues\n- Email: hello@pipedream.in\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Pipedream Training Orchestrator CLI - Train ML models on AWS SageMaker",
"version": "0.1.0",
"project_urls": {
"API Keys": "https://pipedream.in/api-keys",
"Documentation": "https://pipedream.in",
"Homepage": "https://pipedream.in",
"Issues": "https://github.com/pipedream/pdtrain/issues",
"Repository": "https://github.com/pipedream/pdtrain"
},
"split_keywords": [
"ml",
" training",
" sagemaker",
" pipedream",
" cli"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "b621e84575028bbdf7d3c281fb5781358b738faf7801a7c5957897890b450f7b",
"md5": "066771b16cdb02c585ff0d4c2f808ebc",
"sha256": "97f4a1e1e3a98009005170d28028cf2d7b18504381861a83ffe5c6d72a88a65c"
},
"downloads": -1,
"filename": "pdtrain-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "066771b16cdb02c585ff0d4c2f808ebc",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 39908,
"upload_time": "2025-10-07T14:48:04",
"upload_time_iso_8601": "2025-10-07T14:48:04.992779Z",
"url": "https://files.pythonhosted.org/packages/b6/21/e84575028bbdf7d3c281fb5781358b738faf7801a7c5957897890b450f7b/pdtrain-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "54f2b7fed1f3c1d18988ecca51d1027ceed3e8389139926bbbc104967e0a1125",
"md5": "8896b9b73fe8a5c85ce1697cbc295070",
"sha256": "2b700daf0fabc16f2ad16f30e2751acb6aee8e13aba727d4c5b40965b274ce06"
},
"downloads": -1,
"filename": "pdtrain-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "8896b9b73fe8a5c85ce1697cbc295070",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 41724,
"upload_time": "2025-10-07T14:48:06",
"upload_time_iso_8601": "2025-10-07T14:48:06.679475Z",
"url": "https://files.pythonhosted.org/packages/54/f2/b7fed1f3c1d18988ecca51d1027ceed3e8389139926bbbc104967e0a1125/pdtrain-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-07 14:48:06",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pipedream",
"github_project": "pdtrain",
"github_not_found": true,
"lcname": "pdtrain"
}