# SageMaker HyperPod command-line interface
The Amazon SageMaker HyperPod command-line interface (HyperPod CLI) is a tool that helps manage clusters, training jobs, and inference endpoints on the SageMaker HyperPod clusters orchestrated by Amazon EKS.
This documentation serves as a reference for the available HyperPod CLI commands. For a comprehensive user guide, see [Orchestrating SageMaker HyperPod clusters with Amazon EKS](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks.html) in the *Amazon SageMaker Developer Guide*.
Note: Old `hyperpod`CLI V2 has been moved to `release_v2` branch. Please refer [release_v2 branch](https://github.com/aws/sagemaker-hyperpod-cli/tree/release_v2) for usage.
## Table of Contents
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Platform Support](#platform-support)
- [ML Framework Support](#ml-framework-support)
- [Installation](#installation)
- [Usage](#usage)
- [Getting Started](#getting-started)
- [CLI](#cli)
- [Cluster Management](#cluster-management)
- [Training](#training)
- [Inference](#inference)
- [Jumpstart Endpoint](#jumpstart-endpoint-creation)
- [Custom Endpoint](#custom-endpoint-creation)
- [SDK](#sdk)
- [Cluster Management](#cluster-management-sdk)
- [Training](#training-sdk)
- [Inference](#inference-sdk)
- [Examples](#examples)
## Overview
The SageMaker HyperPod CLI is a tool that helps create training jobs and inference endpoint deployments to the Amazon SageMaker HyperPod clusters orchestrated by Amazon EKS. It provides a set of commands for managing the full lifecycle of jobs, including create, describe, list, and delete operations, as well as accessing pod and operator logs where applicable. The CLI is designed to abstract away the complexity of working directly with Kubernetes for these core actions of managing jobs on SageMaker HyperPod clusters orchestrated by Amazon EKS.
## Prerequisites
### Region Configuration
**Important**: For commands that accept the `--region` option, if no region is explicitly provided, the command will use the default region from your AWS credentials configuration.
### Prerequisites for Training
- HyperPod CLI currently supports starting PyTorchJobs. To start a job, you need to install Training Operator first.
- You can follow [pytorch operator doc](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-eks-operator-install.html) to install it.
### Prerequisites for Inference
- HyperPod CLI supports creating Inference Endpoints through jumpstart and through custom Endpoint config
- You can follow [inference operator doc](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-model-deployment-setup.html) to install it.
## Platform Support
SageMaker HyperPod CLI currently supports Linux and MacOS platforms. Windows platform is not supported now.
## ML Framework Support
SageMaker HyperPod CLI currently supports start training job with:
- PyTorch ML Framework. Version requirements: PyTorch >= 1.10
## Installation
1. Make sure that your local python version is 3.8, 3.9, 3.10 or 3.11.
2. Install the sagemaker-hyperpod-cli package.
```bash
pip install sagemaker-hyperpod
```
3. Verify if the installation succeeded by running the following command.
```bash
hyp --help
```
## Usage
The HyperPod CLI provides the following commands:
- [Getting Started](#getting-started)
- [CLI](#cli)
- [Cluster Management](#cluster-management)
- [Training](#training)
- [Inference](#inference)
- [Jumpstart Endpoint](#jumpstart-endpoint-creation)
- [Custom Endpoint](#custom-endpoint-creation)
- [SDK](#sdk)
- [Cluster Management](#cluster-management-sdk)
- [Training](#training-sdk)
- [Inference](#inference-sdk)
### Getting Started
#### Getting Cluster information
This command lists the available SageMaker HyperPod clusters and their capacity information.
```bash
hyp list-cluster
```
| Option | Type | Description |
|--------|------|-------------|
| `--region <region>` | Optional | The region that the SageMaker HyperPod and EKS clusters are located. If not specified, it will be set to the region from the current AWS account credentials. |
| `--namespace <namespace>` | Optional | The namespace that users want to check the quota with. Only the SageMaker managed namespaces are supported. |
| `--output <json\|table>` | Optional | The output format. Available values are `table` and `json`. The default value is `json`. |
| `--debug` | Optional | Enable debug mode for detailed logging. |
#### Connecting to a Cluster
This command configures the local Kubectl environment to interact with the specified SageMaker HyperPod cluster and namespace.
```bash
hyp set-cluster-context --cluster-name <cluster-name>
```
| Option | Type | Description |
|--------|------|-------------|
| `--cluster-name <cluster-name>` | Required | The SageMaker HyperPod cluster name to configure with. |
| `--namespace <namespace>` | Optional | The namespace that you want to connect to. If not specified, Hyperpod cli commands will auto discover the accessible namespace. |
| `--region <region>` | Optional | The AWS region where the HyperPod cluster resides. |
| `--debug` | Optional | Enable debug mode for detailed logging. |
#### Getting Cluster Context
Get all the context related to the current set Cluster
```bash
hyp get-cluster-context
```
| Option | Type | Description |
|--------|------|-------------|
| `--debug` | Optional | Enable debug mode for detailed logging. |
## CLI
### Cluster Management
**Important**: For commands that accept the `--region` option, if no region is explicitly provided, the command will use the default region from your AWS credentials configuration.
**Cluster stack names must be unique within each AWS region.** If you attempt to create a cluster stack with a name that already exists in the same region, the deployment will fail.
#### Initialize Cluster Configuration
Initialize a new cluster configuration in the current directory:
```bash
hyp init cluster-stack
```
**Important**: The `resource_name_prefix` parameter in the generated `config.yaml` file serves as the primary identifier for all AWS resources created during deployment. Each deployment must use a unique resource name prefix to avoid conflicts. This prefix is automatically appended with a unique identifier during cluster creation to ensure resource uniqueness.
#### Configure Cluster Parameters
Configure cluster parameters interactively or via command line:
```bash
hyp configure --resource-name-prefix my-cluster --stage prod
```
#### Validate Configuration
Validate the configuration file syntax:
```bash
hyp validate
```
#### Create Cluster Stack
Create the cluster stack using the configured parameters:
```bash
hyp create --region <region>
```
**Note**: The region flag is optional. If not provided, the command will use the default region from your AWS credentials configuration.
#### List Cluster Stacks
```bash
hyp list cluster-stack
```
| Option | Type | Description |
|--------|------|-------------|
| `--region <region>` | Optional | The AWS region to list stacks from. |
| `--status "['CREATE_COMPLETE', 'UPDATE_COMPLETE']"` | Optional | Filter by stack status. |
| `--debug` | Optional | Enable debug mode for detailed logging. |
#### Describe Cluster Stack
```bash
hyp describe cluster-stack <stack-name>
```
| Option | Type | Description |
|--------|------|-------------|
| `--region <region>` | Optional | The AWS region where the stack exists. |
| `--debug` | Optional | Enable debug mode for detailed logging. |
#### Delete Cluster Stack
Delete a HyperPod cluster stack. Removes the specified CloudFormation stack and all associated AWS resources. This operation cannot be undone.
```bash
hyp delete cluster-stack <stack-name>
```
| Option | Type | Description |
|--------|------|-------------|
| `--region <region>` | Required | The AWS region where the stack exists. |
| `--retain-resources S3Bucket-TrainingData,EFSFileSystem-Models` | Optional | Comma-separated list of logical resource IDs to retain during deletion (only works on DELETE_FAILED stacks). Resource names are shown in failed deletion output, or use AWS CLI: `aws cloudformation list-stack-resources STACK_NAME --region REGION`. |
| `--debug` | Optional | Enable debug mode for detailed logging. |
#### Update Existing Cluster
```bash
hyp update cluster --cluster-name my-cluster \
--instance-groups '[{"InstanceCount":2,"InstanceGroupName":"worker-nodes","InstanceType":"ml.m5.large"}]' \
--node-recovery Automatic
```
#### Reset Configuration
Reset configuration to default values:
```bash
hyp reset
```
### Training
#### **Option 1**: Create Pytorch job through init experience
#### Initialize Pytorch Job Configuration
Initialize a new pytorch job configuration in the current directory:
```bash
hyp init hyp-pytorch-job
```
#### Configure Pytorch Job Parameters
Configure pytorch job parameters interactively or via command line:
```bash
hyp configure --job-name my-pytorch-job
```
#### Validate Configuration
Validate the configuration file syntax:
```bash
hyp validate
```
#### Create Pytorch Job
Create the pytorch job using the configured parameters:
```bash
hyp create
```
#### **Option 2**: Create Pytorch job through create command
```bash
hyp create hyp-pytorch-job \
--version 1.0 \
--job-name test-pytorch-job \
--image pytorch/pytorch:latest \
--command '[python, train.py]' \
--args '[--epochs=10, --batch-size=32]' \
--environment '{"PYTORCH_CUDA_ALLOC_CONF": "max_split_size_mb:32"}' \
--pull-policy "IfNotPresent" \
--instance-type ml.p4d.24xlarge \
--tasks-per-node 8 \
--label-selector '{"accelerator": "nvidia", "network": "efa"}' \
--deep-health-check-passed-nodes-only true \
--scheduler-type "kueue" \
--queue-name "training-queue" \
--priority "high" \
--max-retry 3 \
--accelerators 8 \
--vcpu 96.0 \
--memory 1152.0 \
--accelerators-limit 8 \
--vcpu-limit 96.0 \
--memory-limit 1152.0 \
--preferred-topology "topology.kubernetes.io/zone=us-west-2a" \
--volume name=model-data,type=hostPath,mount_path=/data,path=/data \
--volume name=training-output,type=pvc,mount_path=/data2,claim_name=my-pvc,read_only=false
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--job-name` | TEXT | Yes | Unique name for the training job (1-63 characters, alphanumeric with hyphens) |
| `--image` | TEXT | Yes | Docker image URI containing your training code |
| `--namespace` | TEXT | No | Kubernetes namespace |
| `--command` | ARRAY | No | Command to run in the container (array of strings) |
| `--args` | ARRAY | No | Arguments for the entry script (array of strings) |
| `--environment` | OBJECT | No | Environment variables as key-value pairs |
| `--pull-policy` | TEXT | No | Image pull policy (Always, Never, IfNotPresent) |
| `--instance-type` | TEXT | No | Instance type for training |
| `--node-count` | INTEGER | No | Number of nodes (minimum: 1) |
| `--tasks-per-node` | INTEGER | No | Number of tasks per node (minimum: 1) |
| `--label-selector` | OBJECT | No | Node label selector as key-value pairs |
| `--deep-health-check-passed-nodes-only` | BOOLEAN | No | Schedule pods only on nodes that passed deep health check (default: false) |
| `--scheduler-type` | TEXT | No | Scheduler type |
| `--queue-name` | TEXT | No | Queue name for job scheduling (1-63 characters, alphanumeric with hyphens) |
| `--priority` | TEXT | No | Priority class for job scheduling |
| `--max-retry` | INTEGER | No | Maximum number of job retries (minimum: 0) |
| `--volume` | ARRAY | No | List of volume configurations (Refer [Volume Configuration](#volume-configuration) for detailed parameter info) |
| `--service-account-name` | TEXT | No | Service account name |
| `--accelerators` | INTEGER | No | Number of accelerators a.k.a GPUs or Trainium Chips |
| `--vcpu` | FLOAT | No | Number of vCPUs |
| `--memory` | FLOAT | No | Amount of memory in GiB |
| `--accelerators-limit` | INTEGER | No | Limit for the number of accelerators a.k.a GPUs or Trainium Chips |
| `--vcpu-limit` | FLOAT | No | Limit for the number of vCPUs |
| `--memory-limit` | FLOAT | No | Limit for the amount of memory in GiB |
| `--preferred-topology` | TEXT | No | Preferred topology annotation for scheduling |
| `--required-topology` | TEXT | No | Required topology annotation for scheduling |
| `--debug` | FLAG | No | Enable debug mode (default: false) |
#### List Training Jobs
```bash
hyp list hyp-pytorch-job
```
#### Describe a Training Job
```bash
hyp describe hyp-pytorch-job --job-name <job-name>
````
#### Listing Pods
This command lists all the pods associated with a specific training job.
```bash
hyp list-pods hyp-pytorch-job --job-name <job-name>
```
* `job-name` (string) - Required. The name of the job to list pods for.
#### Accessing Logs
This command retrieves the logs for a specific pod within a training job.
```bash
hyp get-logs hyp-pytorch-job --pod-name <pod-name> --job-name <job-name>
```
| Parameter | Required | Description |
|--------|------|-------------|
| `--job-name` | Yes | The name of the job to get the log for. |
| `--pod-name` | Yes | The name of the pod to get the log from. |
| `--namespace` | No | The namespace of the job. Defaults to 'default'. |
| `--container` | No | The container name to get logs from. |
#### Get Operator Logs
```bash
hyp get-operator-logs hyp-pytorch-job --since-hours 0.5
```
#### Delete a Training Job
```bash
hyp delete hyp-pytorch-job --job-name <job-name>
```
### Inference
### Jumpstart Endpoint Creation
#### **Option 1**: Create jumpstart endpoint through init experience
#### Initialize Jumpstart Endpoint Configuration
Initialize a new jumpstart endpoint configuration in the current directory:
```bash
hyp init hyp-jumpstart-endpoint
```
#### Configure Jumpstart Endpoint Parameters
Configure jumpstart endpoint parameters interactively or via command line:
```bash
hyp configure --endpoint-name my-jumpstart-endpoint
```
#### Validate Configuration
Validate the configuration file syntax:
```bash
hyp validate
```
#### Create Jumpstart Endpoint
Create the jumpstart endpoint using the configured parameters:
```bash
hyp create
```
#### **Option 2**: Create jumpstart endpoint through create command
Pre-trained Jumpstart models can be gotten from https://sagemaker.readthedocs.io/en/v2.82.0/doc_utils/jumpstart.html and fed into the call for creating the endpoint
```bash
hyp create hyp-jumpstart-endpoint \
--version 1.0 \
--model-id jumpstart-model-id\
--instance-type ml.g5.8xlarge \
--endpoint-name endpoint-jumpstart
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--model-id` | TEXT | Yes | JumpStart model identifier (1-63 characters, alphanumeric with hyphens) |
| `--instance-type` | TEXT | Yes | EC2 instance type for inference (must start with "ml.") |
| `--namespace` | TEXT | No | Kubernetes namespace |
| `--metadata-name` | TEXT | No | Name of the jumpstart endpoint object |
| `--accept-eula` | BOOLEAN | No | Whether model terms of use have been accepted (default: false) |
| `--model-version` | TEXT | No | Semantic version of the model (e.g., "1.0.0", 5-14 characters) |
| `--endpoint-name` | TEXT | No | Name of SageMaker endpoint (1-63 characters, alphanumeric with hyphens) |
| `--tls-certificate-output-s3-uri` | TEXT | No | S3 URI to write the TLS certificate (optional) |
| `--debug` | FLAG | No | Enable debug mode (default: false) |
#### Invoke a JumpstartModel Endpoint
```bash
hyp invoke hyp-jumpstart-endpoint \
--endpoint-name endpoint-jumpstart \
--body '{"inputs":"What is the capital of USA?"}'
```
#### Managing an Endpoint
```bash
hyp list hyp-jumpstart-endpoint
hyp describe hyp-jumpstart-endpoint --name endpoint-jumpstart
```
#### List Pods
```bash
hyp list-pods hyp-jumpstart-endpoint
```
#### Get Logs
```bash
hyp get-logs hyp-jumpstart-endpoint --pod-name <pod-name>
```
#### Get Operator Logs
```bash
hyp get-operator-logs hyp-jumpstart-endpoint --since-hours 0.5
```
#### Deleting an Endpoint
```bash
hyp delete hyp-jumpstart-endpoint --name endpoint-jumpstart
```
### Custom Endpoint Creation
#### **Option 1**: Create custom endpoint through init experience
#### Initialize Custom Endpoint Configuration
Initialize a new custom endpoint configuration in the current directory:
```bash
hyp init hyp-custom-endpoint
```
#### Configure Custom Endpoint Parameters
Configure custom endpoint parameters interactively or via command line:
```bash
hyp configure --endpoint-name my-custom-endpoint
```
#### Validate Configuration
Validate the configuration file syntax:
```bash
hyp validate
```
#### Create Custom Endpoint
Create the custom endpoint using the configured parameters:
```bash
hyp create
```
#### **Option 2**: Create custom endpoint through create command
```bash
hyp create hyp-custom-endpoint \
--version 1.0 \
--endpoint-name endpoint-custom \
--model-name my-pytorch-model \
--model-source-type s3 \
--model-location my-pytorch-training \
--model-volume-mount-name test-volume \
--s3-bucket-name your-bucket \
--s3-region us-east-1 \
--instance-type ml.g5.8xlarge \
--image-uri 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:latest \
--container-port 8080
```
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `--instance-type` | TEXT | Yes | EC2 instance type for inference (must start with "ml.") |
| `--model-name` | TEXT | Yes | Name of model to create on SageMaker (1-63 characters, alphanumeric with hyphens) |
| `--model-source-type` | TEXT | Yes | Model source type ("s3" or "fsx") |
| `--image-uri` | TEXT | Yes | Docker image URI for inference |
| `--container-port` | INTEGER | Yes | Port on which model server listens (1-65535) |
| `--model-volume-mount-name` | TEXT | Yes | Name of the model volume mount |
| `--namespace` | TEXT | No | Kubernetes namespace |
| `--metadata-name` | TEXT | No | Name of the custom endpoint object |
| `--endpoint-name` | TEXT | No | Name of SageMaker endpoint (1-63 characters, alphanumeric with hyphens) |
| `--env` | OBJECT | No | Environment variables as key-value pairs |
| `--metrics-enabled` | BOOLEAN | No | Enable metrics collection (default: false) |
| `--model-version` | TEXT | No | Version of the model (semantic version format) |
| `--model-location` | TEXT | No | Specific model data location |
| `--prefetch-enabled` | BOOLEAN | No | Whether to pre-fetch model data (default: false) |
| `--tls-certificate-output-s3-uri` | TEXT | No | S3 URI for TLS certificate output |
| `--fsx-dns-name` | TEXT | No | FSx File System DNS Name |
| `--fsx-file-system-id` | TEXT | No | FSx File System ID |
| `--fsx-mount-name` | TEXT | No | FSx File System Mount Name |
| `--s3-bucket-name` | TEXT | No | S3 bucket location |
| `--s3-region` | TEXT | No | S3 bucket region |
| `--model-volume-mount-path` | TEXT | No | Path inside container for model volume (default: "/opt/ml/model") |
| `--resources-limits` | OBJECT | No | Resource limits for the worker |
| `--resources-requests` | OBJECT | No | Resource requests for the worker |
| `--dimensions` | OBJECT | No | CloudWatch Metric dimensions as key-value pairs |
| `--metric-collection-period` | INTEGER | No | Period for CloudWatch query (default: 300) |
| `--metric-collection-start-time` | INTEGER | No | StartTime for CloudWatch query (default: 300) |
| `--metric-name` | TEXT | No | Metric name to query for CloudWatch trigger |
| `--metric-stat` | TEXT | No | Statistics metric for CloudWatch (default: "Average") |
| `--metric-type` | TEXT | No | Type of metric for HPA ("Value" or "Average", default: "Average") |
| `--min-value` | NUMBER | No | Minimum metric value for empty CloudWatch response (default: 0) |
| `--cloud-watch-trigger-name` | TEXT | No | Name for the CloudWatch trigger |
| `--cloud-watch-trigger-namespace` | TEXT | No | AWS CloudWatch namespace for the metric |
| `--target-value` | NUMBER | No | Target value for the CloudWatch metric |
| `--use-cached-metrics` | BOOLEAN | No | Enable caching of metric values (default: true) |
| `--invocation-endpoint` | TEXT | No | Invocation endpoint path (default: "invocations") |
| `--debug` | FLAG | No | Enable debug mode (default: false) |
#### Invoke a Custom Inference Endpoint
```bash
hyp invoke hyp-custom-endpoint \
--endpoint-name endpoint-custom-pytorch \
--body '{"inputs":"What is the capital of USA?"}'
```
#### Managing an Endpoint
```bash
hyp list hyp-custom-endpoint
hyp describe hyp-custom-endpoint --name endpoint-custom
```
#### List Pods
```bash
hyp list-pods hyp-custom-endpoint
```
#### Get Logs
```bash
hyp get-logs hyp-custom-endpoint --pod-name <pod-name>
```
#### Get Operator Logs
```bash
hyp get-operator-logs hyp-custom-endpoint --since-hours 0.5
```
#### Deleting an Endpoint
```bash
hyp delete hyp-custom-endpoint --name endpoint-custom
```
## SDK
Along with the CLI, we also have SDKs available that can perform the cluster management, training and inference functionalities that the CLI performs
### Cluster Management SDK
#### Creating a Cluster Stack
```python
from sagemaker.hyperpod.cluster_management.hp_cluster_stack import HpClusterStack
# Initialize cluster stack configuration
cluster_stack = HpClusterStack(
stage="prod",
resource_name_prefix="my-hyperpod",
hyperpod_cluster_name="my-hyperpod-cluster",
eks_cluster_name="my-hyperpod-eks",
# Infrastructure components
create_vpc_stack=True,
create_eks_cluster_stack=True,
create_hyperpod_cluster_stack=True,
# Network configuration
vpc_cidr="10.192.0.0/16",
availability_zone_ids=["use2-az1", "use2-az2"],
# Instance group configuration
instance_group_settings=[
{
"InstanceCount": 1,
"InstanceGroupName": "controller-group",
"InstanceType": "ml.t3.medium",
"TargetAvailabilityZoneId": "use2-az2"
}
]
)
# Create the cluster stack
response = cluster_stack.create(region="us-east-2")
```
#### Listing Cluster Stacks
```python
# List all cluster stacks
stacks = HpClusterStack.list(region="us-east-2")
print(f"Found {len(stacks['StackSummaries'])} stacks")
```
#### Describing a Cluster Stack
```python
# Describe a specific cluster stack
stack_info = HpClusterStack.describe("my-stack-name", region="us-east-2")
print(f"Stack status: {stack_info['Stacks'][0]['StackStatus']}")
```
#### Monitoring Cluster Status
```python
from sagemaker.hyperpod.cluster_management.hp_cluster_stack import HpClusterStack
stack = HpClusterStack()
response = stack.create(region="us-west-2")
status = stack.get_status(region="us-west-2")
print(status)
```
### Training SDK
#### Creating a Training Job
```python
from sagemaker.hyperpod.training.hyperpod_pytorch_job import HyperPodPytorchJob
from sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config import (
ReplicaSpec, Template, Spec, Containers, Resources, RunPolicy
)
from sagemaker.hyperpod.common.config.metadata import Metadata
# Define job specifications
nproc_per_node = "1" # Number of processes per node
replica_specs =
[
ReplicaSpec
(
name = "pod", # Replica name
template = Template
(
spec = Spec
(
containers =
[
Containers
(
# Container name
name="container-name",
# Training image
image="123456789012.dkr.ecr.us-west-2.amazonaws.com/my-training-image:latest",
# Always pull image
image_pull_policy="Always",
resources=Resources\
(
# No GPUs requested
requests={"nvidia.com/gpu": "0"},
# No GPU limit
limits={"nvidia.com/gpu": "0"},
),
# Command to run
command=["python", "train.py"],
# Script arguments
args=["--epochs", "10", "--batch-size", "32"],
)
]
)
),
)
]
# Keep pods after completion
run_policy = RunPolicy(clean_pod_policy="None")
# Create and start the PyTorch job
pytorch_job = HyperPodPytorchJob
(
# Job name
metadata = Metadata(name="demo"),
# Processes per node
nproc_per_node = nproc_per_node,
# Replica specifications
replica_specs = replica_specs,
# Run policy
run_policy = run_policy,
)
# Launch the job
pytorch_job.create()
```
#### List Training Jobs
```python
from sagemaker.hyperpod.training import HyperPodPytorchJob
import yaml
# List all PyTorch jobs
jobs = HyperPodPytorchJob.list()
print(yaml.dump(jobs))
```
#### Describe a Training Job
```python
from sagemaker.hyperpod.training import HyperPodPytorchJob
# Get an existing job
job = HyperPodPytorchJob.get(name="my-pytorch-job")
print(job)
```
#### List Pods for a Training Job
```python
from sagemaker.hyperpod.training import HyperPodPytorchJob
# List Pods for an existing job
job = HyperPodPytorchJob.get(name="my-pytorch-job")
print(job.list_pods())
```
#### Get Logs from a Pod
```python
from sagemaker.hyperpod.training import HyperPodPytorchJob
# Get pod logs for a job
job = HyperPodPytorchJob.get(name="my-pytorch-job")
print(job.get_logs_from_pod("pod-name"))
```
#### Get Training Operator Logs
```python
from sagemaker.hyperpod.training import HyperPodPytorchJob
# Get training operator logs
job = HyperPodPytorchJob.get(name="my-pytorch-job")
print(job.get_operator_logs(since_hours=0.1))
```
#### Delete a Training Job
```python
from sagemaker.hyperpod.training import HyperPodPytorchJob
# Get an existing job
job = HyperPodPytorchJob.get(name="my-pytorch-job")
# Delete the job
job.delete()
```
### Inference SDK
#### Creating a JumpstartModel Endpoint
Pre-trained Jumpstart models can be gotten from https://sagemaker.readthedocs.io/en/v2.82.0/doc_utils/jumpstart.html and fed into the call for creating the endpoint
```python
from sagemaker.hyperpod.inference.config.hp_jumpstart_endpoint_config import Model, Server, SageMakerEndpoint, TlsConfig
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
model=Model(
model_id='deepseek-llm-r1-distill-qwen-1-5b'
)
server=Server(
instance_type='ml.g5.8xlarge',
)
endpoint_name=SageMakerEndpoint(name='<my-endpoint-name>')
js_endpoint=HPJumpStartEndpoint(
model=model,
server=server,
sage_maker_endpoint=endpoint_name
)
js_endpoint.create()
```
#### Creating a Custom Inference Endpoint (with S3)
```python
from sagemaker.hyperpod.inference.config.hp_endpoint_config import CloudWatchTrigger, Dimensions, AutoScalingSpec, Metrics, S3Storage, ModelSourceConfig, TlsConfig, EnvironmentVariables, ModelInvocationPort, ModelVolumeMount, Resources, Worker
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
model_source_config = ModelSourceConfig(
model_source_type='s3',
model_location="<my-model-folder-in-s3>",
s3_storage=S3Storage(
bucket_name='<my-model-artifacts-bucket>',
region='us-east-2',
),
)
environment_variables = [
EnvironmentVariables(name="HF_MODEL_ID", value="/opt/ml/model"),
EnvironmentVariables(name="SAGEMAKER_PROGRAM", value="inference.py"),
EnvironmentVariables(name="SAGEMAKER_SUBMIT_DIRECTORY", value="/opt/ml/model/code"),
EnvironmentVariables(name="MODEL_CACHE_ROOT", value="/opt/ml/model"),
EnvironmentVariables(name="SAGEMAKER_ENV", value="1"),
]
worker = Worker(
image='763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi2.3.1-gpu-py311-cu124-ubuntu22.04-v2.0',
model_volume_mount=ModelVolumeMount(
name='model-weights',
),
model_invocation_port=ModelInvocationPort(container_port=8080),
resources=Resources(
requests={"cpu": "30000m", "nvidia.com/gpu": 1, "memory": "100Gi"},
limits={"nvidia.com/gpu": 1}
),
environment_variables=environment_variables,
)
tls_config=TlsConfig(tls_certificate_output_s3_uri='s3://<my-tls-bucket-name>')
custom_endpoint = HPEndpoint(
endpoint_name='<my-endpoint-name>',
instance_type='ml.g5.8xlarge',
model_name='deepseek15b-test-model-name',
tls_config=tls_config,
model_source_config=model_source_config,
worker=worker,
)
custom_endpoint.create()
```
#### List Endpoints
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
# List JumpStart endpoints
jumpstart_endpoints = HPJumpStartEndpoint.list()
print(jumpstart_endpoints)
# List custom endpoints
custom_endpoints = HPEndpoint.list()
print(custom_endpoints)
```
#### Describe an Endpoint
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
# Get JumpStart endpoint details
jumpstart_endpoint = HPJumpStartEndpoint.get(name="js-endpoint-name", namespace="test")
print(jumpstart_endpoint)
# Get custom endpoint details
custom_endpoint = HPEndpoint.get(name="endpoint-custom")
print(custom_endpoint)
```
#### Invoke an Endpoint
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
data = '{"inputs":"What is the capital of USA?"}'
jumpstart_endpoint = HPJumpStartEndpoint.get(name="endpoint-jumpstart")
response = jumpstart_endpoint.invoke(body=data).body.read()
print(response)
custom_endpoint = HPEndpoint.get(name="endpoint-custom")
response = custom_endpoint.invoke(body=data).body.read()
print(response)
```
#### List Pods
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
# List pods
js_pods = HPJumpStartEndpoint.list_pods()
print(js_pods)
c_pods = HPEndpoint.list_pods()
print(c_pods)
```
#### Get Logs
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
# Get logs from pod
js_logs = HPJumpStartEndpoint.get_logs(pod=<pod-name>)
print(js_logs)
c_logs = HPEndpoint.get_logs(pod=<pod-name>)
print(c_logs)
```
#### Get Operator Logs
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
# Invoke JumpStart endpoint
print(HPJumpStartEndpoint.get_operator_logs(since_hours=0.1))
# Invoke custom endpoint
print(HPEndpoint.get_operator_logs(since_hours=0.1))
```
#### Delete an Endpoint
```python
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint
# Delete JumpStart endpoint
jumpstart_endpoint = HPJumpStartEndpoint.get(name="endpoint-jumpstart")
jumpstart_endpoint.delete()
# Delete custom endpoint
custom_endpoint = HPEndpoint.get(name="endpoint-custom")
custom_endpoint.delete()
```
#### Observability - Getting Monitoring Information
```python
from sagemaker.hyperpod.observability.utils import get_monitoring_config
monitor_config = get_monitoring_config()
```
## Examples
#### Cluster Management Example Notebooks
[CLI Cluster Management Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/cluster_management/cluster_creation_init_experience.ipynb)
[SDK Cluster Management Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/cluster_management/cluster_creation_sdk_experience.ipynb)
#### Training Example Notebooks
[CLI Training Init Experience Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/training/CLI/training-init-experience.ipynb)
[CLI Training Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/training/CLI/training-e2e-cli.ipynb)
[SDK Training Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/training/SDK/training_sdk_example.ipynb)
#### Inference Example Notebooks
##### CLI
[CLI Inference Jumpstart Model Init Experience Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-jumpstart-init-experience.ipynb)
[CLI Inference JumpStart Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-jumpstart-e2e-cli.ipynb)
[CLI Inference FSX Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-fsx-model-e2e-cli.ipynb)
[CLI Inference S3 Model Init Experience Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-s3-model-init-experience.ipynb)
[CLI Inference S3 Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-s3-model-e2e-cli.ipynb)
##### SDK
[SDK Inference JumpStart Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/SDK/inference-jumpstart-e2e.ipynb)
[SDK Inference FSX Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/SDK/inference-fsx-model-e2e.ipynb)
[SDK Inference S3 Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/SDK/inference-s3-model-e2e.ipynb)
## Disclaimer
* This CLI and SDK requires access to the user's file system to set and get context and function properly.
It needs to read configuration files such as kubeconfig to establish the necessary environment settings.
## Working behind a proxy server ?
* Follow these steps from [here](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-proxy.html) to set up HTTP proxy connections
Raw data
{
"_id": null,
"home_page": "https://github.com/aws/sagemaker-hyperpod-cli",
"name": "sagemaker-hyperpod",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": null,
"author": "Amazon Web Services",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f8/ea/53bb571418027697a87809c273f4aca95ff3527a81bcbe988f9dd946ce8e/sagemaker_hyperpod-3.3.1.tar.gz",
"platform": null,
"description": "\n# SageMaker HyperPod command-line interface\n\nThe Amazon SageMaker HyperPod command-line interface (HyperPod CLI) is a tool that helps manage clusters, training jobs, and inference endpoints on the SageMaker HyperPod clusters orchestrated by Amazon EKS.\n\nThis documentation serves as a reference for the available HyperPod CLI commands. For a comprehensive user guide, see [Orchestrating SageMaker HyperPod clusters with Amazon EKS](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks.html) in the *Amazon SageMaker Developer Guide*.\n\nNote: Old `hyperpod`CLI V2 has been moved to `release_v2` branch. Please refer [release_v2 branch](https://github.com/aws/sagemaker-hyperpod-cli/tree/release_v2) for usage.\n\n## Table of Contents\n- [Overview](#overview)\n- [Prerequisites](#prerequisites)\n- [Platform Support](#platform-support)\n- [ML Framework Support](#ml-framework-support)\n- [Installation](#installation)\n- [Usage](#usage)\n - [Getting Started](#getting-started)\n - [CLI](#cli)\n - [Cluster Management](#cluster-management)\n - [Training](#training)\n - [Inference](#inference)\n - [Jumpstart Endpoint](#jumpstart-endpoint-creation)\n - [Custom Endpoint](#custom-endpoint-creation)\n - [SDK](#sdk)\n - [Cluster Management](#cluster-management-sdk)\n - [Training](#training-sdk)\n - [Inference](#inference-sdk)\n- [Examples](#examples)\n \n\n## Overview\n\nThe SageMaker HyperPod CLI is a tool that helps create training jobs and inference endpoint deployments to the Amazon SageMaker HyperPod clusters orchestrated by Amazon EKS. It provides a set of commands for managing the full lifecycle of jobs, including create, describe, list, and delete operations, as well as accessing pod and operator logs where applicable. The CLI is designed to abstract away the complexity of working directly with Kubernetes for these core actions of managing jobs on SageMaker HyperPod clusters orchestrated by Amazon EKS.\n\n## Prerequisites\n\n### Region Configuration\n\n**Important**: For commands that accept the `--region` option, if no region is explicitly provided, the command will use the default region from your AWS credentials configuration.\n\n### Prerequisites for Training\n\n- HyperPod CLI currently supports starting PyTorchJobs. To start a job, you need to install Training Operator first. \n - You can follow [pytorch operator doc](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-eks-operator-install.html) to install it.\n\n### Prerequisites for Inference \n\n- HyperPod CLI supports creating Inference Endpoints through jumpstart and through custom Endpoint config \n - You can follow [inference operator doc](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-model-deployment-setup.html) to install it.\n\n## Platform Support\n\nSageMaker HyperPod CLI currently supports Linux and MacOS platforms. Windows platform is not supported now.\n\n## ML Framework Support\n\nSageMaker HyperPod CLI currently supports start training job with:\n- PyTorch ML Framework. Version requirements: PyTorch >= 1.10\n\n## Installation\n\n1. Make sure that your local python version is 3.8, 3.9, 3.10 or 3.11.\n\n2. Install the sagemaker-hyperpod-cli package.\n\n ```bash\n pip install sagemaker-hyperpod\n ```\n\n3. Verify if the installation succeeded by running the following command.\n\n ```bash\n hyp --help\n ```\n\n## Usage\n\nThe HyperPod CLI provides the following commands:\n\n- [Getting Started](#getting-started)\n- [CLI](#cli)\n - [Cluster Management](#cluster-management)\n - [Training](#training)\n - [Inference](#inference)\n - [Jumpstart Endpoint](#jumpstart-endpoint-creation)\n - [Custom Endpoint](#custom-endpoint-creation)\n- [SDK](#sdk)\n - [Cluster Management](#cluster-management-sdk)\n - [Training](#training-sdk)\n - [Inference](#inference-sdk)\n\n\n### Getting Started\n\n#### Getting Cluster information\n\nThis command lists the available SageMaker HyperPod clusters and their capacity information.\n\n```bash\nhyp list-cluster\n```\n\n| Option | Type | Description |\n|--------|------|-------------|\n| `--region <region>` | Optional | The region that the SageMaker HyperPod and EKS clusters are located. If not specified, it will be set to the region from the current AWS account credentials. |\n| `--namespace <namespace>` | Optional | The namespace that users want to check the quota with. Only the SageMaker managed namespaces are supported. |\n| `--output <json\\|table>` | Optional | The output format. Available values are `table` and `json`. The default value is `json`. |\n| `--debug` | Optional | Enable debug mode for detailed logging. |\n\n#### Connecting to a Cluster\n\nThis command configures the local Kubectl environment to interact with the specified SageMaker HyperPod cluster and namespace.\n\n```bash\nhyp set-cluster-context --cluster-name <cluster-name>\n```\n\n| Option | Type | Description |\n|--------|------|-------------|\n| `--cluster-name <cluster-name>` | Required | The SageMaker HyperPod cluster name to configure with. |\n| `--namespace <namespace>` | Optional | The namespace that you want to connect to. If not specified, Hyperpod cli commands will auto discover the accessible namespace. |\n| `--region <region>` | Optional | The AWS region where the HyperPod cluster resides. |\n| `--debug` | Optional | Enable debug mode for detailed logging. |\n\n#### Getting Cluster Context\n\nGet all the context related to the current set Cluster\n\n```bash\nhyp get-cluster-context\n```\n\n| Option | Type | Description |\n|--------|------|-------------|\n| `--debug` | Optional | Enable debug mode for detailed logging. |\n\n\n## CLI\n\n### Cluster Management \n\n**Important**: For commands that accept the `--region` option, if no region is explicitly provided, the command will use the default region from your AWS credentials configuration.\n\n**Cluster stack names must be unique within each AWS region.** If you attempt to create a cluster stack with a name that already exists in the same region, the deployment will fail.\n\n#### Initialize Cluster Configuration\n\nInitialize a new cluster configuration in the current directory:\n\n```bash\nhyp init cluster-stack\n```\n\n**Important**: The `resource_name_prefix` parameter in the generated `config.yaml` file serves as the primary identifier for all AWS resources created during deployment. Each deployment must use a unique resource name prefix to avoid conflicts. This prefix is automatically appended with a unique identifier during cluster creation to ensure resource uniqueness.\n\n#### Configure Cluster Parameters\n\nConfigure cluster parameters interactively or via command line:\n\n```bash\nhyp configure --resource-name-prefix my-cluster --stage prod\n```\n\n#### Validate Configuration\n\nValidate the configuration file syntax:\n\n```bash\nhyp validate\n```\n\n#### Create Cluster Stack\n\nCreate the cluster stack using the configured parameters:\n\n```bash\nhyp create --region <region>\n```\n\n**Note**: The region flag is optional. If not provided, the command will use the default region from your AWS credentials configuration.\n\n#### List Cluster Stacks\n\n```bash\nhyp list cluster-stack\n```\n\n| Option | Type | Description |\n|--------|------|-------------|\n| `--region <region>` | Optional | The AWS region to list stacks from. |\n| `--status \"['CREATE_COMPLETE', 'UPDATE_COMPLETE']\"` | Optional | Filter by stack status. |\n| `--debug` | Optional | Enable debug mode for detailed logging. |\n\n#### Describe Cluster Stack\n\n```bash\nhyp describe cluster-stack <stack-name>\n```\n\n| Option | Type | Description |\n|--------|------|-------------|\n| `--region <region>` | Optional | The AWS region where the stack exists. |\n| `--debug` | Optional | Enable debug mode for detailed logging. |\n\n#### Delete Cluster Stack\n\nDelete a HyperPod cluster stack. Removes the specified CloudFormation stack and all associated AWS resources. This operation cannot be undone.\n\n```bash\n hyp delete cluster-stack <stack-name>\n```\n\n| Option | Type | Description |\n|--------|------|-------------|\n| `--region <region>` | Required | The AWS region where the stack exists. |\n| `--retain-resources S3Bucket-TrainingData,EFSFileSystem-Models` | Optional | Comma-separated list of logical resource IDs to retain during deletion (only works on DELETE_FAILED stacks). Resource names are shown in failed deletion output, or use AWS CLI: `aws cloudformation list-stack-resources STACK_NAME --region REGION`. |\n| `--debug` | Optional | Enable debug mode for detailed logging. |\n\n\n#### Update Existing Cluster\n\n```bash\nhyp update cluster --cluster-name my-cluster \\\n --instance-groups '[{\"InstanceCount\":2,\"InstanceGroupName\":\"worker-nodes\",\"InstanceType\":\"ml.m5.large\"}]' \\\n --node-recovery Automatic\n```\n\n#### Reset Configuration\n\nReset configuration to default values:\n\n```bash\nhyp reset\n```\n\n### Training \n\n#### **Option 1**: Create Pytorch job through init experience\n\n#### Initialize Pytorch Job Configuration\n\nInitialize a new pytorch job configuration in the current directory:\n\n```bash\nhyp init hyp-pytorch-job\n```\n\n#### Configure Pytorch Job Parameters\n\nConfigure pytorch job parameters interactively or via command line:\n\n```bash\nhyp configure --job-name my-pytorch-job\n```\n\n#### Validate Configuration\n\nValidate the configuration file syntax:\n\n```bash\nhyp validate\n```\n\n#### Create Pytorch Job\n\nCreate the pytorch job using the configured parameters:\n\n```bash\nhyp create\n```\n\n\n#### **Option 2**: Create Pytorch job through create command\n\n```bash\nhyp create hyp-pytorch-job \\\n --version 1.0 \\\n --job-name test-pytorch-job \\\n --image pytorch/pytorch:latest \\\n --command '[python, train.py]' \\\n --args '[--epochs=10, --batch-size=32]' \\\n --environment '{\"PYTORCH_CUDA_ALLOC_CONF\": \"max_split_size_mb:32\"}' \\\n --pull-policy \"IfNotPresent\" \\\n --instance-type ml.p4d.24xlarge \\\n --tasks-per-node 8 \\\n --label-selector '{\"accelerator\": \"nvidia\", \"network\": \"efa\"}' \\\n --deep-health-check-passed-nodes-only true \\\n --scheduler-type \"kueue\" \\\n --queue-name \"training-queue\" \\\n --priority \"high\" \\\n --max-retry 3 \\\n --accelerators 8 \\\n --vcpu 96.0 \\\n --memory 1152.0 \\\n --accelerators-limit 8 \\\n --vcpu-limit 96.0 \\\n --memory-limit 1152.0 \\\n --preferred-topology \"topology.kubernetes.io/zone=us-west-2a\" \\\n --volume name=model-data,type=hostPath,mount_path=/data,path=/data \\\n --volume name=training-output,type=pvc,mount_path=/data2,claim_name=my-pvc,read_only=false\n```\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `--job-name` | TEXT | Yes | Unique name for the training job (1-63 characters, alphanumeric with hyphens) |\n| `--image` | TEXT | Yes | Docker image URI containing your training code |\n| `--namespace` | TEXT | No | Kubernetes namespace |\n| `--command` | ARRAY | No | Command to run in the container (array of strings) |\n| `--args` | ARRAY | No | Arguments for the entry script (array of strings) |\n| `--environment` | OBJECT | No | Environment variables as key-value pairs |\n| `--pull-policy` | TEXT | No | Image pull policy (Always, Never, IfNotPresent) |\n| `--instance-type` | TEXT | No | Instance type for training |\n| `--node-count` | INTEGER | No | Number of nodes (minimum: 1) |\n| `--tasks-per-node` | INTEGER | No | Number of tasks per node (minimum: 1) |\n| `--label-selector` | OBJECT | No | Node label selector as key-value pairs |\n| `--deep-health-check-passed-nodes-only` | BOOLEAN | No | Schedule pods only on nodes that passed deep health check (default: false) |\n| `--scheduler-type` | TEXT | No | Scheduler type |\n| `--queue-name` | TEXT | No | Queue name for job scheduling (1-63 characters, alphanumeric with hyphens) |\n| `--priority` | TEXT | No | Priority class for job scheduling |\n| `--max-retry` | INTEGER | No | Maximum number of job retries (minimum: 0) |\n| `--volume` | ARRAY | No | List of volume configurations (Refer [Volume Configuration](#volume-configuration) for detailed parameter info) |\n| `--service-account-name` | TEXT | No | Service account name |\n| `--accelerators` | INTEGER | No | Number of accelerators a.k.a GPUs or Trainium Chips |\n| `--vcpu` | FLOAT | No | Number of vCPUs |\n| `--memory` | FLOAT | No | Amount of memory in GiB |\n| `--accelerators-limit` | INTEGER | No | Limit for the number of accelerators a.k.a GPUs or Trainium Chips |\n| `--vcpu-limit` | FLOAT | No | Limit for the number of vCPUs |\n| `--memory-limit` | FLOAT | No | Limit for the amount of memory in GiB |\n| `--preferred-topology` | TEXT | No | Preferred topology annotation for scheduling |\n| `--required-topology` | TEXT | No | Required topology annotation for scheduling |\n| `--debug` | FLAG | No | Enable debug mode (default: false) |\n\n#### List Training Jobs\n\n```bash\nhyp list hyp-pytorch-job\n```\n\n#### Describe a Training Job\n\n```bash\nhyp describe hyp-pytorch-job --job-name <job-name>\n````\n\n#### Listing Pods\n\nThis command lists all the pods associated with a specific training job.\n\n```bash\nhyp list-pods hyp-pytorch-job --job-name <job-name>\n```\n\n* `job-name` (string) - Required. The name of the job to list pods for.\n\n#### Accessing Logs\n\nThis command retrieves the logs for a specific pod within a training job.\n\n```bash\nhyp get-logs hyp-pytorch-job --pod-name <pod-name> --job-name <job-name>\n```\n\n| Parameter | Required | Description |\n|--------|------|-------------|\n| `--job-name` | Yes | The name of the job to get the log for. |\n| `--pod-name` | Yes | The name of the pod to get the log from. |\n| `--namespace` | No | The namespace of the job. Defaults to 'default'. |\n| `--container` | No | The container name to get logs from. |\n\n#### Get Operator Logs\n\n```bash\nhyp get-operator-logs hyp-pytorch-job --since-hours 0.5\n```\n\n#### Delete a Training Job\n\n```bash\nhyp delete hyp-pytorch-job --job-name <job-name>\n```\n\n### Inference \n\n### Jumpstart Endpoint Creation\n\n#### **Option 1**: Create jumpstart endpoint through init experience\n\n#### Initialize Jumpstart Endpoint Configuration\n\nInitialize a new jumpstart endpoint configuration in the current directory:\n\n```bash\nhyp init hyp-jumpstart-endpoint\n```\n\n#### Configure Jumpstart Endpoint Parameters\n\nConfigure jumpstart endpoint parameters interactively or via command line:\n\n```bash\nhyp configure --endpoint-name my-jumpstart-endpoint\n```\n\n#### Validate Configuration\n\nValidate the configuration file syntax:\n\n```bash\nhyp validate\n```\n\n#### Create Jumpstart Endpoint\n\nCreate the jumpstart endpoint using the configured parameters:\n\n```bash\nhyp create\n```\n\n\n#### **Option 2**: Create jumpstart endpoint through create command\nPre-trained Jumpstart models can be gotten from https://sagemaker.readthedocs.io/en/v2.82.0/doc_utils/jumpstart.html and fed into the call for creating the endpoint\n\n```bash\nhyp create hyp-jumpstart-endpoint \\\n --version 1.0 \\\n --model-id jumpstart-model-id\\\n --instance-type ml.g5.8xlarge \\\n --endpoint-name endpoint-jumpstart\n```\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `--model-id` | TEXT | Yes | JumpStart model identifier (1-63 characters, alphanumeric with hyphens) |\n| `--instance-type` | TEXT | Yes | EC2 instance type for inference (must start with \"ml.\") |\n| `--namespace` | TEXT | No | Kubernetes namespace |\n| `--metadata-name` | TEXT | No | Name of the jumpstart endpoint object |\n| `--accept-eula` | BOOLEAN | No | Whether model terms of use have been accepted (default: false) |\n| `--model-version` | TEXT | No | Semantic version of the model (e.g., \"1.0.0\", 5-14 characters) |\n| `--endpoint-name` | TEXT | No | Name of SageMaker endpoint (1-63 characters, alphanumeric with hyphens) |\n| `--tls-certificate-output-s3-uri` | TEXT | No | S3 URI to write the TLS certificate (optional) |\n| `--debug` | FLAG | No | Enable debug mode (default: false) |\n\n\n#### Invoke a JumpstartModel Endpoint\n\n```bash\nhyp invoke hyp-jumpstart-endpoint \\\n --endpoint-name endpoint-jumpstart \\\n --body '{\"inputs\":\"What is the capital of USA?\"}'\n```\n\n\n#### Managing an Endpoint \n\n```bash\nhyp list hyp-jumpstart-endpoint\nhyp describe hyp-jumpstart-endpoint --name endpoint-jumpstart\n```\n\n#### List Pods\n\n```bash\nhyp list-pods hyp-jumpstart-endpoint\n```\n\n#### Get Logs\n\n```bash\nhyp get-logs hyp-jumpstart-endpoint --pod-name <pod-name>\n```\n\n#### Get Operator Logs\n\n```bash\nhyp get-operator-logs hyp-jumpstart-endpoint --since-hours 0.5\n```\n\n#### Deleting an Endpoint\n\n```bash\nhyp delete hyp-jumpstart-endpoint --name endpoint-jumpstart\n```\n\n\n### Custom Endpoint Creation\n#### **Option 1**: Create custom endpoint through init experience\n\n#### Initialize Custom Endpoint Configuration\n\nInitialize a new custom endpoint configuration in the current directory:\n\n```bash\nhyp init hyp-custom-endpoint\n```\n\n#### Configure Custom Endpoint Parameters\n\nConfigure custom endpoint parameters interactively or via command line:\n\n```bash\nhyp configure --endpoint-name my-custom-endpoint\n```\n\n#### Validate Configuration\n\nValidate the configuration file syntax:\n\n```bash\nhyp validate\n```\n\n#### Create Custom Endpoint\n\nCreate the custom endpoint using the configured parameters:\n\n```bash\nhyp create\n```\n\n\n#### **Option 2**: Create custom endpoint through create command\n```bash\nhyp create hyp-custom-endpoint \\\n --version 1.0 \\\n --endpoint-name endpoint-custom \\\n --model-name my-pytorch-model \\\n --model-source-type s3 \\\n --model-location my-pytorch-training \\\n --model-volume-mount-name test-volume \\\n --s3-bucket-name your-bucket \\\n --s3-region us-east-1 \\\n --instance-type ml.g5.8xlarge \\\n --image-uri 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:latest \\\n --container-port 8080\n```\n\n| Parameter | Type | Required | Description |\n|-----------|------|----------|-------------|\n| `--instance-type` | TEXT | Yes | EC2 instance type for inference (must start with \"ml.\") |\n| `--model-name` | TEXT | Yes | Name of model to create on SageMaker (1-63 characters, alphanumeric with hyphens) |\n| `--model-source-type` | TEXT | Yes | Model source type (\"s3\" or \"fsx\") |\n| `--image-uri` | TEXT | Yes | Docker image URI for inference |\n| `--container-port` | INTEGER | Yes | Port on which model server listens (1-65535) |\n| `--model-volume-mount-name` | TEXT | Yes | Name of the model volume mount |\n| `--namespace` | TEXT | No | Kubernetes namespace |\n| `--metadata-name` | TEXT | No | Name of the custom endpoint object |\n| `--endpoint-name` | TEXT | No | Name of SageMaker endpoint (1-63 characters, alphanumeric with hyphens) |\n| `--env` | OBJECT | No | Environment variables as key-value pairs |\n| `--metrics-enabled` | BOOLEAN | No | Enable metrics collection (default: false) |\n| `--model-version` | TEXT | No | Version of the model (semantic version format) |\n| `--model-location` | TEXT | No | Specific model data location |\n| `--prefetch-enabled` | BOOLEAN | No | Whether to pre-fetch model data (default: false) |\n| `--tls-certificate-output-s3-uri` | TEXT | No | S3 URI for TLS certificate output |\n| `--fsx-dns-name` | TEXT | No | FSx File System DNS Name |\n| `--fsx-file-system-id` | TEXT | No | FSx File System ID |\n| `--fsx-mount-name` | TEXT | No | FSx File System Mount Name |\n| `--s3-bucket-name` | TEXT | No | S3 bucket location |\n| `--s3-region` | TEXT | No | S3 bucket region |\n| `--model-volume-mount-path` | TEXT | No | Path inside container for model volume (default: \"/opt/ml/model\") |\n| `--resources-limits` | OBJECT | No | Resource limits for the worker |\n| `--resources-requests` | OBJECT | No | Resource requests for the worker |\n| `--dimensions` | OBJECT | No | CloudWatch Metric dimensions as key-value pairs |\n| `--metric-collection-period` | INTEGER | No | Period for CloudWatch query (default: 300) |\n| `--metric-collection-start-time` | INTEGER | No | StartTime for CloudWatch query (default: 300) |\n| `--metric-name` | TEXT | No | Metric name to query for CloudWatch trigger |\n| `--metric-stat` | TEXT | No | Statistics metric for CloudWatch (default: \"Average\") |\n| `--metric-type` | TEXT | No | Type of metric for HPA (\"Value\" or \"Average\", default: \"Average\") |\n| `--min-value` | NUMBER | No | Minimum metric value for empty CloudWatch response (default: 0) |\n| `--cloud-watch-trigger-name` | TEXT | No | Name for the CloudWatch trigger |\n| `--cloud-watch-trigger-namespace` | TEXT | No | AWS CloudWatch namespace for the metric |\n| `--target-value` | NUMBER | No | Target value for the CloudWatch metric |\n| `--use-cached-metrics` | BOOLEAN | No | Enable caching of metric values (default: true) |\n| `--invocation-endpoint` | TEXT | No | Invocation endpoint path (default: \"invocations\") |\n| `--debug` | FLAG | No | Enable debug mode (default: false) |\n\n\n#### Invoke a Custom Inference Endpoint \n\n```bash\nhyp invoke hyp-custom-endpoint \\\n --endpoint-name endpoint-custom-pytorch \\\n --body '{\"inputs\":\"What is the capital of USA?\"}'\n```\n\n#### Managing an Endpoint \n\n```bash\nhyp list hyp-custom-endpoint\nhyp describe hyp-custom-endpoint --name endpoint-custom\n```\n\n#### List Pods\n\n```bash\nhyp list-pods hyp-custom-endpoint\n```\n\n#### Get Logs\n\n```bash\nhyp get-logs hyp-custom-endpoint --pod-name <pod-name>\n```\n\n#### Get Operator Logs\n\n```bash\nhyp get-operator-logs hyp-custom-endpoint --since-hours 0.5\n```\n\n#### Deleting an Endpoint\n\n```bash\nhyp delete hyp-custom-endpoint --name endpoint-custom\n```\n\n## SDK \n\nAlong with the CLI, we also have SDKs available that can perform the cluster management, training and inference functionalities that the CLI performs\n\n### Cluster Management SDK\n\n#### Creating a Cluster Stack\n\n```python\nfrom sagemaker.hyperpod.cluster_management.hp_cluster_stack import HpClusterStack\n\n# Initialize cluster stack configuration\ncluster_stack = HpClusterStack(\n stage=\"prod\",\n resource_name_prefix=\"my-hyperpod\",\n hyperpod_cluster_name=\"my-hyperpod-cluster\",\n eks_cluster_name=\"my-hyperpod-eks\",\n \n # Infrastructure components\n create_vpc_stack=True,\n create_eks_cluster_stack=True,\n create_hyperpod_cluster_stack=True,\n \n # Network configuration\n vpc_cidr=\"10.192.0.0/16\",\n availability_zone_ids=[\"use2-az1\", \"use2-az2\"],\n \n # Instance group configuration\n instance_group_settings=[\n {\n \"InstanceCount\": 1,\n \"InstanceGroupName\": \"controller-group\",\n \"InstanceType\": \"ml.t3.medium\",\n \"TargetAvailabilityZoneId\": \"use2-az2\"\n }\n ]\n)\n\n# Create the cluster stack\nresponse = cluster_stack.create(region=\"us-east-2\")\n```\n\n#### Listing Cluster Stacks\n\n```python\n# List all cluster stacks\nstacks = HpClusterStack.list(region=\"us-east-2\")\nprint(f\"Found {len(stacks['StackSummaries'])} stacks\")\n```\n\n#### Describing a Cluster Stack\n\n```python\n# Describe a specific cluster stack\nstack_info = HpClusterStack.describe(\"my-stack-name\", region=\"us-east-2\")\nprint(f\"Stack status: {stack_info['Stacks'][0]['StackStatus']}\")\n```\n\n#### Monitoring Cluster Status\n\n```python\nfrom sagemaker.hyperpod.cluster_management.hp_cluster_stack import HpClusterStack\n\nstack = HpClusterStack()\nresponse = stack.create(region=\"us-west-2\")\nstatus = stack.get_status(region=\"us-west-2\")\nprint(status)\n```\n\n### Training SDK\n\n#### Creating a Training Job \n\n```python\nfrom sagemaker.hyperpod.training.hyperpod_pytorch_job import HyperPodPytorchJob\nfrom sagemaker.hyperpod.training.config.hyperpod_pytorch_job_unified_config import (\n ReplicaSpec, Template, Spec, Containers, Resources, RunPolicy\n)\nfrom sagemaker.hyperpod.common.config.metadata import Metadata\n\n# Define job specifications\nnproc_per_node = \"1\" # Number of processes per node\nreplica_specs = \n[\n ReplicaSpec\n (\n name = \"pod\", # Replica name\n template = Template\n (\n spec = Spec\n (\n containers =\n [\n Containers\n (\n # Container name\n name=\"container-name\", \n \n # Training image\n image=\"123456789012.dkr.ecr.us-west-2.amazonaws.com/my-training-image:latest\", \n \n # Always pull image\n image_pull_policy=\"Always\", \n resources=Resources\\\n (\n # No GPUs requested\n requests={\"nvidia.com/gpu\": \"0\"}, \n # No GPU limit\n limits={\"nvidia.com/gpu\": \"0\"}, \n ),\n # Command to run\n command=[\"python\", \"train.py\"], \n # Script arguments\n args=[\"--epochs\", \"10\", \"--batch-size\", \"32\"], \n )\n ]\n )\n ),\n )\n]\n# Keep pods after completion\nrun_policy = RunPolicy(clean_pod_policy=\"None\") \n\n# Create and start the PyTorch job\npytorch_job = HyperPodPytorchJob\n(\n # Job name\n metadata = Metadata(name=\"demo\"), \n # Processes per node\n nproc_per_node = nproc_per_node, \n # Replica specifications\n replica_specs = replica_specs, \n # Run policy\n run_policy = run_policy, \n)\n# Launch the job\npytorch_job.create() \n``` \n\n#### List Training Jobs\n```python\nfrom sagemaker.hyperpod.training import HyperPodPytorchJob\nimport yaml\n\n# List all PyTorch jobs\njobs = HyperPodPytorchJob.list()\nprint(yaml.dump(jobs))\n```\n\n#### Describe a Training Job\n```python\nfrom sagemaker.hyperpod.training import HyperPodPytorchJob\n\n# Get an existing job\njob = HyperPodPytorchJob.get(name=\"my-pytorch-job\")\n\nprint(job)\n```\n\n#### List Pods for a Training Job\n```python\nfrom sagemaker.hyperpod.training import HyperPodPytorchJob\n\n# List Pods for an existing job\njob = HyperPodPytorchJob.get(name=\"my-pytorch-job\")\nprint(job.list_pods())\n```\n\n#### Get Logs from a Pod\n```python\nfrom sagemaker.hyperpod.training import HyperPodPytorchJob\n\n# Get pod logs for a job\njob = HyperPodPytorchJob.get(name=\"my-pytorch-job\")\nprint(job.get_logs_from_pod(\"pod-name\"))\n```\n\n#### Get Training Operator Logs\n```python\nfrom sagemaker.hyperpod.training import HyperPodPytorchJob\n\n# Get training operator logs\njob = HyperPodPytorchJob.get(name=\"my-pytorch-job\")\nprint(job.get_operator_logs(since_hours=0.1))\n```\n\n#### Delete a Training Job\n```python\nfrom sagemaker.hyperpod.training import HyperPodPytorchJob\n\n# Get an existing job\njob = HyperPodPytorchJob.get(name=\"my-pytorch-job\")\n\n# Delete the job\njob.delete()\n```\n\n### Inference SDK\n\n#### Creating a JumpstartModel Endpoint\n\nPre-trained Jumpstart models can be gotten from https://sagemaker.readthedocs.io/en/v2.82.0/doc_utils/jumpstart.html and fed into the call for creating the endpoint\n\n```python\nfrom sagemaker.hyperpod.inference.config.hp_jumpstart_endpoint_config import Model, Server, SageMakerEndpoint, TlsConfig\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\n\nmodel=Model(\n model_id='deepseek-llm-r1-distill-qwen-1-5b'\n)\nserver=Server(\n instance_type='ml.g5.8xlarge',\n)\nendpoint_name=SageMakerEndpoint(name='<my-endpoint-name>')\n\njs_endpoint=HPJumpStartEndpoint(\n model=model,\n server=server,\n sage_maker_endpoint=endpoint_name\n)\n\njs_endpoint.create()\n```\n\n#### Creating a Custom Inference Endpoint (with S3)\n\n```python\nfrom sagemaker.hyperpod.inference.config.hp_endpoint_config import CloudWatchTrigger, Dimensions, AutoScalingSpec, Metrics, S3Storage, ModelSourceConfig, TlsConfig, EnvironmentVariables, ModelInvocationPort, ModelVolumeMount, Resources, Worker\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\nmodel_source_config = ModelSourceConfig(\n model_source_type='s3',\n model_location=\"<my-model-folder-in-s3>\",\n s3_storage=S3Storage(\n bucket_name='<my-model-artifacts-bucket>',\n region='us-east-2',\n ),\n)\n\nenvironment_variables = [\n EnvironmentVariables(name=\"HF_MODEL_ID\", value=\"/opt/ml/model\"),\n EnvironmentVariables(name=\"SAGEMAKER_PROGRAM\", value=\"inference.py\"),\n EnvironmentVariables(name=\"SAGEMAKER_SUBMIT_DIRECTORY\", value=\"/opt/ml/model/code\"),\n EnvironmentVariables(name=\"MODEL_CACHE_ROOT\", value=\"/opt/ml/model\"),\n EnvironmentVariables(name=\"SAGEMAKER_ENV\", value=\"1\"),\n]\n\nworker = Worker(\n image='763104351884.dkr.ecr.us-east-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.4.0-tgi2.3.1-gpu-py311-cu124-ubuntu22.04-v2.0',\n model_volume_mount=ModelVolumeMount(\n name='model-weights',\n ),\n model_invocation_port=ModelInvocationPort(container_port=8080),\n resources=Resources(\n requests={\"cpu\": \"30000m\", \"nvidia.com/gpu\": 1, \"memory\": \"100Gi\"},\n limits={\"nvidia.com/gpu\": 1}\n ),\n environment_variables=environment_variables,\n)\n\ntls_config=TlsConfig(tls_certificate_output_s3_uri='s3://<my-tls-bucket-name>')\n\ncustom_endpoint = HPEndpoint(\n endpoint_name='<my-endpoint-name>',\n instance_type='ml.g5.8xlarge',\n model_name='deepseek15b-test-model-name', \n tls_config=tls_config,\n model_source_config=model_source_config,\n worker=worker,\n)\n\ncustom_endpoint.create()\n```\n\n\n#### List Endpoints\n\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\n# List JumpStart endpoints\njumpstart_endpoints = HPJumpStartEndpoint.list()\nprint(jumpstart_endpoints)\n\n# List custom endpoints\ncustom_endpoints = HPEndpoint.list()\nprint(custom_endpoints)\n```\n\n#### Describe an Endpoint\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\n# Get JumpStart endpoint details\njumpstart_endpoint = HPJumpStartEndpoint.get(name=\"js-endpoint-name\", namespace=\"test\")\nprint(jumpstart_endpoint)\n\n# Get custom endpoint details\ncustom_endpoint = HPEndpoint.get(name=\"endpoint-custom\")\nprint(custom_endpoint)\n\n```\n\n#### Invoke an Endpoint\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\ndata = '{\"inputs\":\"What is the capital of USA?\"}'\njumpstart_endpoint = HPJumpStartEndpoint.get(name=\"endpoint-jumpstart\")\nresponse = jumpstart_endpoint.invoke(body=data).body.read()\nprint(response)\n\ncustom_endpoint = HPEndpoint.get(name=\"endpoint-custom\")\nresponse = custom_endpoint.invoke(body=data).body.read()\nprint(response)\n```\n\n#### List Pods\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\n# List pods \njs_pods = HPJumpStartEndpoint.list_pods()\nprint(js_pods)\n\nc_pods = HPEndpoint.list_pods()\nprint(c_pods)\n```\n\n#### Get Logs\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\n# Get logs from pod \njs_logs = HPJumpStartEndpoint.get_logs(pod=<pod-name>)\nprint(js_logs)\n\nc_logs = HPEndpoint.get_logs(pod=<pod-name>)\nprint(c_logs)\n```\n\n#### Get Operator Logs\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\n# Invoke JumpStart endpoint\nprint(HPJumpStartEndpoint.get_operator_logs(since_hours=0.1))\n\n# Invoke custom endpoint\nprint(HPEndpoint.get_operator_logs(since_hours=0.1))\n```\n\n#### Delete an Endpoint\n```python\nfrom sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint\nfrom sagemaker.hyperpod.inference.hp_endpoint import HPEndpoint\n\n# Delete JumpStart endpoint\njumpstart_endpoint = HPJumpStartEndpoint.get(name=\"endpoint-jumpstart\")\njumpstart_endpoint.delete()\n\n# Delete custom endpoint\ncustom_endpoint = HPEndpoint.get(name=\"endpoint-custom\")\ncustom_endpoint.delete()\n```\n\n\n#### Observability - Getting Monitoring Information\n```python\nfrom sagemaker.hyperpod.observability.utils import get_monitoring_config\nmonitor_config = get_monitoring_config()\n```\n\n## Examples\n#### Cluster Management Example Notebooks\n\n[CLI Cluster Management Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/cluster_management/cluster_creation_init_experience.ipynb)\n\n[SDK Cluster Management Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/cluster_management/cluster_creation_sdk_experience.ipynb)\n\n#### Training Example Notebooks\n\n[CLI Training Init Experience Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/training/CLI/training-init-experience.ipynb)\n\n[CLI Training Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/training/CLI/training-e2e-cli.ipynb)\n\n[SDK Training Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/training/SDK/training_sdk_example.ipynb)\n\n#### Inference Example Notebooks\n\n##### CLI\n[CLI Inference Jumpstart Model Init Experience Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-jumpstart-init-experience.ipynb)\n\n[CLI Inference JumpStart Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-jumpstart-e2e-cli.ipynb)\n\n[CLI Inference FSX Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-fsx-model-e2e-cli.ipynb)\n\n[CLI Inference S3 Model Init Experience Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-s3-model-init-experience.ipynb)\n\n[CLI Inference S3 Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/CLI/inference-s3-model-e2e-cli.ipynb)\n\n##### SDK\n\n[SDK Inference JumpStart Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/SDK/inference-jumpstart-e2e.ipynb)\n\n[SDK Inference FSX Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/SDK/inference-fsx-model-e2e.ipynb)\n\n[SDK Inference S3 Model Example](https://github.com/aws/sagemaker-hyperpod-cli/blob/main/examples/inference/SDK/inference-s3-model-e2e.ipynb)\n\n\n## Disclaimer \n\n* This CLI and SDK requires access to the user's file system to set and get context and function properly. \nIt needs to read configuration files such as kubeconfig to establish the necessary environment settings.\n\n\n## Working behind a proxy server ?\n* Follow these steps from [here](https://docs.aws.amazon.com/cli/v1/userguide/cli-configure-proxy.html) to set up HTTP proxy connections\n\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Amazon SageMaker HyperPod SDK and CLI",
"version": "3.3.1",
"project_urls": {
"Homepage": "https://github.com/aws/sagemaker-hyperpod-cli"
},
"split_keywords": [],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "1c48a9ff5524eb08143aa5cd479d23774d406e3474a81370a8b6803dfa5a8110",
"md5": "9b53479a05d90c54976f5971ce057be4",
"sha256": "cc6a345e43449848bb0ba33749098e3a1f865e1ef34cc625c03ccdc9090fc215"
},
"downloads": -1,
"filename": "sagemaker_hyperpod-3.3.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "9b53479a05d90c54976f5971ce057be4",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 5342297,
"upload_time": "2025-10-30T20:47:59",
"upload_time_iso_8601": "2025-10-30T20:47:59.353686Z",
"url": "https://files.pythonhosted.org/packages/1c/48/a9ff5524eb08143aa5cd479d23774d406e3474a81370a8b6803dfa5a8110/sagemaker_hyperpod-3.3.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "f8ea53bb571418027697a87809c273f4aca95ff3527a81bcbe988f9dd946ce8e",
"md5": "61014da6915179e4a4cc9972e49ca894",
"sha256": "4f5fd45c0f7e30435a4d9a0c6f6e8b30a37905dd8c07be4861c876e0a22dbcf9"
},
"downloads": -1,
"filename": "sagemaker_hyperpod-3.3.1.tar.gz",
"has_sig": false,
"md5_digest": "61014da6915179e4a4cc9972e49ca894",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 1446307,
"upload_time": "2025-10-30T20:48:01",
"upload_time_iso_8601": "2025-10-30T20:48:01.142713Z",
"url": "https://files.pythonhosted.org/packages/f8/ea/53bb571418027697a87809c273f4aca95ff3527a81bcbe988f9dd946ce8e/sagemaker_hyperpod-3.3.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-30 20:48:01",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "aws",
"github_project": "sagemaker-hyperpod-cli",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "sagemaker-hyperpod"
}