vertex-deployer


Namevertex-deployer JSON
Version 0.5.4 PyPI version JSON
download
home_pagehttps://github.com/artefactory/vertex-pipelines-deployer
SummaryCheck, compile, upload, run, and schedule Kubeflow Pipelines on GCP Vertex AI in a standardized manner.
upload_time2024-10-11 20:26:32
maintainerNone
docs_urlNone
authorartefactory
requires_python<3.13.0,>=3.8
licenseApache-2.0
keywords kubeflow vertexai aiplatform gcp mlops deployer pipeline
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <br />
<div align="center">
    <h1 align="center">Vertex Pipelines Deployer</h1>
    <p align="center">
        <a href="https://www.artefact.com/">
        <img src="docs/assets/logo.svg" style="max-width:50%;height:auto;background-color:#111146;" alt="Artefact Skaff Logo"/>
        </a>
    </p>
    <h3 align="center">Deploy Vertex Pipelines within minutes</h3>
        <p align="center">
        This tool is a wrapper around <a href="https://www.kubeflow.org/docs/components/pipelines/v2/hello-world/">kfp</a> and <a href="https://cloud.google.com/python/docs/reference/aiplatform/latest">google-cloud-aiplatform</a> that allows you to check, compile, upload, run, and schedule Vertex Pipelines in a standardized manner.
        </p>
</div>
<br />

<!-- PROJECT SHIELDS -->
<div align="center">

![PyPI - Python Version](https://img.shields.io/pypi/pyversions/vertex-deployer?logo=python)
![PyPI - Status](https://img.shields.io/pypi/v/vertex-deployer)
![PyPI - Downloads](https://img.shields.io/pypi/dm/vertex-deployer?color=blue)
![PyPI - License](https://img.shields.io/pypi/l/vertex-deployer)

[![CI](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/ci.yaml/badge.svg?branch=main&event=push)](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/ci.yaml)
[![Release](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/release.yaml/badge.svg?branch=main&event=push)](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/release.yaml)

[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-informational?logo=pre-commit&logoColor=white)](https://github.com/ornikar/vertex-eduscore/blob/develop/.pre-commit-config.yaml)
[![Linting: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat)](https://pycqa.github.io/isort/)

</div>


<details>
  <summary>📚 Table of Contents</summary>
  <ol>
    <li><a href="#-why-this-tool">Why this tool?</a></li>
    <li><a href="#-prerequisites">Prerequisites</a></li>
    <li><a href="#-installation">Installation</a></li>
        <ol>
            <li><a href="#from-git-repo">From git repo</a></li>
            <li><a href="#from-artifact-registry-not-available-in-pypi-yet">From Artifact Registry (not available in PyPI yet)</a></li>
            <li><a href="#add-to-requirements">Add to requirements</a></li>
        </ol>
    <li><a href="#-usage">Usage</a></li>
        <ol>
            <li><a href="#-setup">Setup</a></li>
            <li><a href="#-folder-structure">Folder Structure</a></li>
            <li><a href="#-cli-deploying-a-pipeline-with-deploy">CLI: Deploying a Pipeline with `deploy`</a></li>
            <li><a href="#-cli-checking-pipelines-are-valid-with-check">CLI: Checking Pipelines are valid with `check`</a></li>
            <li><a href="#-cli-other-commands">CLI: Other commands</a></li>
                <ol>
                    <li><a href="#config">`config`</a></li>
                    <li><a href="#create">`create`</a></li>
                    <li><a href="#init">`init`</a></li>
                    <li><a href="#list">`list`</a></li>
                </ol>
        </ol>
    <li><a href="#cli-options">CLI: Options</a></li>
    <li><a href="#configuration">Configuration</a></li>
  </ol>
</details>


[Full CLI documentation](docs/CLI_REFERENCE.md)


## ❓ Why this tool?
<!-- --8<-- [start:why] -->

Three use cases:

1. **CI:** Check pipeline validity.
2. **Dev mode:** Quickly iterate over your pipelines by compiling and running them in multiple environments (test, dev, staging, etc.) without duplicating code or searching for the right kfp/aiplatform snippet.
3. **CD:** Deploy your pipelines to Vertex Pipelines in a standardized manner in your CD with Cloud Build or GitHub Actions.


Two main commands:

- `check`: Check your pipelines (imports, compile, check configs validity against pipeline definition).
- `deploy`: Compile, upload to Artifact Registry, run, and schedule your pipelines.

<!-- --8<-- [end:why] -->

## 📋 Prerequisites
<!-- --8<-- [start:prerequisites] -->

- Unix-like environment (Linux, macOS, WSL, etc.)
- Python 3.8 to 3.10
- Google Cloud SDK
- A GCP project with Vertex Pipelines enabled
<!-- --8<-- [end:prerequisites] -->

## 📦 Installation
<!-- --8<-- [start:installation] -->
### From PyPI

```bash
pip install vertex-deployer
```

### From git repo

Stable version:
```bash
pip install git+https://github.com/artefactory/vertex-pipelines-deployer.git@main
```

Develop version:
```bash
pip install git+https://github.com/artefactory/vertex-pipelines-deployer.git@develop
```

If you want to test this package on examples from this repo:
```bash
git clone git@github.com:artefactory/vertex-pipelines-deployer.git
poetry install
poetry shell  # if you want to activate the virtual environment
cd example
```
<!-- --8<-- [end:installation] -->

## 🚀 Usage
<!-- --8<-- [start:setup] -->
### 🛠️ Setup

1. Setup your GCP environment:
```bash
export PROJECT_ID=<gcp_project_id>
gcloud config set project $PROJECT_ID
gcloud auth login
gcloud auth application-default login
```

2. You need the following APIs to be enabled:
- Cloud Build API
- Artifact Registry API
- Cloud Storage API
- Vertex AI API
```bash
gcloud services enable \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com \
    storage.googleapis.com \
    aiplatform.googleapis.com
```

3. Create an artifact registry repository for your base images (Docker format):
```bash
export GAR_DOCKER_REPO_ID=<your_gar_repo_id_for_images>
export GAR_LOCATION=<your_gar_location>
gcloud artifacts repositories create ${GAR_DOCKER_REPO_ID} \
    --location=${GAR_LOCATION} \
    --repository-format=docker
```

4. Build and upload your base images to the repository. To do so, please follow Google Cloud Build documentation.

5. Create an artifact registry repository for your pipelines (KFP format):
```bash
export GAR_PIPELINES_REPO_ID=<your_gar_repo_id_for_pipelines>
gcloud artifacts repositories create ${GAR_PIPELINES_REPO_ID} \
    --location=${GAR_LOCATION} \
    --repository-format=kfp
```

6. Create a GCS bucket for Vertex Pipelines staging:
```bash
export GCP_REGION=<your_gcp_region>
export VERTEX_STAGING_BUCKET_NAME=<your_bucket_name>
gcloud storage buckets create gs://${VERTEX_STAGING_BUCKET_NAME} --location=${GCP_REGION}
```

7. Create a service account for Vertex Pipelines:
```bash
export VERTEX_SERVICE_ACCOUNT_NAME=foobar
export VERTEX_SERVICE_ACCOUNT="${VERTEX_SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"

gcloud iam service-accounts create ${VERTEX_SERVICE_ACCOUNT_NAME}

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
    --member="serviceAccount:${VERTEX_SERVICE_ACCOUNT}" \
    --role="roles/aiplatform.user"

gcloud storage buckets add-iam-policy-binding gs://${VERTEX_STAGING_BUCKET_NAME} \
    --member="serviceAccount:${VERTEX_SERVICE_ACCOUNT}" \
    --role="roles/storage.objectUser"

gcloud artifacts repositories add-iam-policy-binding ${GAR_PIPELINES_REPO_ID} \
   --location=${GAR_LOCATION} \
   --member="serviceAccount:${VERTEX_SERVICE_ACCOUNT}" \
   --role="roles/artifactregistry.admin"
```
<!-- --8<-- [end:setup] -->
You can use the deployer CLI (see example below) or import [`VertexPipelineDeployer`](deployer/pipeline_deployer.py) in your code (try it yourself).

### 📁 Folder Structure

<!-- --8<-- [start:folder_structure] -->
You must respect the following folder structure. If you already follow the
[Vertex Pipelines Starter Kit folder structure](https://github.com/artefactory/vertex-pipeline-starter-kit), it should be pretty smooth to use this tool:

```
vertex
├─ configs/
│  └─ {pipeline_name}
│     └─ {config_name}.json
└─ pipelines/
   └─ {pipeline_name}.py
```

!!! tip "About folder structure"
    You must have at least these files. If you need to share some config elements between pipelines,
    you can have a `shared` folder in `configs` and import them in your pipeline configs.

    If you're following a different folder structure, you can change the default paths in the `pyproject.toml` file.
    See [Configuration](#configuration) section for more information.

#### Pipelines

Your file `{pipeline_name}.py` must contain a function called `{pipeline_name}` decorated using `kfp.dsl.pipeline`.
In previous versions, the functions / object used to be called `pipeline` but it was changed to `{pipeline_name}` to avoid confusion with the `kfp.dsl.pipeline` decorator.

```python
# vertex/pipelines/dummy_pipeline.py
import kfp.dsl

# New name to avoid confusion with the kfp.dsl.pipeline decorator
@kfp.dsl.pipeline()
def dummy_pipeline():
    ...

# Old name
@kfp.dsl.pipeline()
def pipeline():
    ...
```

#### Configs

Config file can be either `.py`, `.json`, `.toml` or `yaml` format.
They must be located in the `config/{pipeline_name}` folder.

**Why multiple formats?**

`.py` files are useful to define complex configs (e.g. a list of dicts) while `.json` / `.toml` / `yaml` files are useful to define simple configs (e.g. a string).
It also adds flexibility to the user and allows you to use the deployer with almost no migration cost.

**How to format them?**

- `.py` files must be valid python files with two important elements:

    * `parameter_values` to pass arguments to your pipeline
    * `input_artifacts` if you want to retrieve and create input artifacts to your pipeline.
    See [Vertex Documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob) for more information.

- `.json` files must be valid json files containing only one dict of key: value representing parameter values.
- `.toml` files must be the same. Please note that TOML sections will be flattened, except for inline tables.
    Section names will be joined using `"_"` separator and this is not configurable at the moment.
    Example:

    === "TOML file"
        ```toml
        [modeling]
        model_name = "my-model"
        params = { lambda = 0.1 }
        ```

    === "Resulting parameter values"
        ```python
        {
            "modeling_model_name": "my-model",
            "modeling_params": { "lambda": 0.1 }
        }
        ```

- `.yaml` files must be valid yaml files containing only one dict of key: value representing parameter values.

??? question "Why are sections flattened when using TOML config files?"
    Vertex Pipelines parameter validation and parameter logging to Vertex Experiments are based on the parameter name.
    If you do not flatten your sections, you'll only be able to validate section names and that they should be of type `dict`.

    Not very useful.

??? question "Why aren't `input_artifacts` supported in TOML / JSON config files?"
    Because it's low on the priority list. Feel free to open a PR if you want to add it.


**How to name them?**

`{config_name}.py` or `{config_name}.json` or `{config_name}.toml`. config_name is free but must be unique for a given pipeline.


#### Settings

You will also need the following ENV variables, either exported or in a `.env` file (see example in `example.env`):

```bash
PROJECT_ID=YOUR_PROJECT_ID  # GCP Project ID
GCP_REGION=europe-west1  # GCP Region

GAR_LOCATION=europe-west1  # Google Artifact Registry Location
GAR_PIPELINES_REPO_ID=YOUR_GAR_KFP_REPO_ID  # Google Artifact Registry Repo ID (KFP format)

VERTEX_STAGING_BUCKET_NAME=YOUR_VERTEX_STAGING_BUCKET_NAME  # GCS Bucket for Vertex Pipelines staging
VERTEX_SERVICE_ACCOUNT=YOUR_VERTEX_SERVICE_ACCOUNT  # Vertex Pipelines Service Account
```

!!! note "About env files"
    We're using env files and dotenv to load the environment variables.
    No default value for `--env-file` argument is provided to ensure that you don't accidentally deploy to the wrong project.
    An [`example.env`](./example/example.env) file is provided in this repo.
    This also allows you to work with multiple environments thanks to env files (`test.env`, `dev.env`, `prod.env`, etc)
<!-- --8<-- [end:folder_structure] -->

<!-- --8<-- [start:usage] -->
### 🚀 CLI: Deploying a Pipeline with `deploy`

Let's say you defined a pipeline in `dummy_pipeline.py` and a config file named `config_test.json`. You can deploy your pipeline using the following command:
```bash
vertex-deployer deploy dummy_pipeline \
    --compile \
    --upload \
    --run \
    --env-file example.env \
    --tags my-tag \
    --config-filepath vertex/configs/dummy_pipeline/config_test.json \
    --experiment-name my-experiment \
    --enable-caching \
    --skip-validation
```

### ✅ CLI: Checking Pipelines are valid with `check`

To check that your pipelines are valid, you can use the `check` command. It uses a pydantic model to:
- check that your pipeline imports and definition are valid
- check that your pipeline can be compiled
- check that all configs related to the pipeline are respecting the pipeline definition (using a Pydantic model based on pipeline signature)

To validate one or multiple pipeline(s):
```bash
vertex-deployer check dummy_pipeline <other pipeline name>
```

To validate all pipelines in the `vertex/pipelines` folder:
```bash
vertex-deployer check --all
```


### 🛠️ CLI: Other commands

#### `config`

You can check your `vertex-deployer` configuration options using the `config` command.
Fields set in `pyproject.toml` will overwrite default values and will be displayed differently:
```bash
vertex-deployer config --all
```

#### `create`

You can create all files needed for a pipeline using the `create` command:
```bash
vertex-deployer create my_new_pipeline --config-type py
```

This will create a `my_new_pipeline.py` file in the `vertex/pipelines` folder and a `vertex/config/my_new_pipeline/` folder with multiple config files in it.

#### `init`

To initialize the deployer with default settings and folder structure, use the `init` command:
```bash
vertex-deployer init
```

```bash
$ vertex-deployer init
Welcome to Vertex Deployer!
This command will help you getting fired up.
Do you want to configure the deployer? [y/n]: n
Do you want to build default folder structure [y/n]: n
Do you want to create a pipeline? [y/n]: n
All done ✨
```

#### `list`

You can list all pipelines in the `vertex/pipelines` folder using the `list` command:
```bash
vertex-deployer list --with-configs
```

### 🍭 CLI: Options

```bash
vertex-deployer --help
```

To see package version:
```bash
vertex-deployer --version
```

To adapt log level, use the `--log-level` option. Default is `INFO`.
```bash
vertex-deployer --log-level DEBUG deploy ...
```

<!-- --8<-- [end:usage] -->

## Configuration

You can configure the deployer using the `pyproject.toml` file to better fit your needs.
This will overwrite default values. It can be useful if you always use the same options, e.g. always the same `--scheduler-timezone`

```toml
[tool.vertex-deployer]
vertex_folder_path = "my/path/to/vertex"
log_level = "INFO"

[tool.vertex-deployer.deploy]
scheduler_timezone = "Europe/Paris"
```

You can display all the configurable parameterss with default values by running:
```bash
$ vertex-deployer config --all
'*' means the value was set in config file

* vertex_folder_path=my/path/to/vertex
* log_level=INFO
deploy
  env_file=None
  compile=True
  upload=False
  run=False
  schedule=False
  cron=None
  delete_last_schedule=False
  * scheduler_timezone=Europe/Paris
  tags=['latest']
  config_filepath=None
  config_name=None
  enable_caching=False
  experiment_name=None
check
  all=False
  config_filepath=None
  raise_error=False
list
  with_configs=True
create
  config_type=json
```

## Repository Structure

```
├─ .github
│  ├─ ISSUE_TEMPLATE/
│  ├─ workflows
│  │  ├─ ci.yaml
│  │  ├─ pr_agent.yaml
│  │  └─ release.yaml
│  ├─ CODEOWNERS
│  └─ PULL_REQUEST_TEMPLATE.md
├─ deployer                                     # Source code
│  ├─ __init__.py
│  ├─ cli.py
│  ├─ constants.py
│  ├─ pipeline_checks.py
│  ├─ pipeline_deployer.py
│  ├─ settings.py
│  └─ utils
│     ├─ config.py
│     ├─ console.py
│     ├─ exceptions.py
│     ├─ logging.py
│     ├─ models.py
│     └─ utils.py
├─ docs/                                        # Documentation folder (mkdocs)
├─ templates/                                   # Semantic Release templates
├─ tests/
├─ example                                      # Example folder with dummy pipeline and config
|   ├─ example.env
│   └─ vertex
│      ├─ components
│      │  └─ dummy.py
│      ├─ configs
│      │  ├─ broken_pipeline
│      │  │  └─ config_test.json
│      │  └─ dummy_pipeline
│      │     ├─ config_test.json
│      │     ├─ config.py
│      │     └─ config.toml
│      ├─ deployment
│      ├─ lib
│      └─ pipelines
│         ├─ broken_pipeline.py
│         └─ dummy_pipeline.py
├─ .gitignore
├─ .pre-commit-config.yaml
├─ catalog-info.yaml                            # Roadie integration configuration
├─ CHANGELOG.md
├─ CONTRIBUTING.md
├─ LICENSE
├─ Makefile
├─ mkdocs.yml                                   # Mkdocs configuration
├─ pyproject.toml
└─ README.md
```


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/artefactory/vertex-pipelines-deployer",
    "name": "vertex-deployer",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13.0,>=3.8",
    "maintainer_email": null,
    "keywords": "kubeflow, vertexai, aiplatform, gcp, mlops, deployer, pipeline",
    "author": "artefactory",
    "author_email": "jules.bertrand@artefact.com",
    "download_url": "https://files.pythonhosted.org/packages/0d/97/bdda25246c00f1f2fc4e4d71324d8efd7f3f54438460341bbcc4979cbe9d/vertex_deployer-0.5.4.tar.gz",
    "platform": null,
    "description": "<br />\n<div align=\"center\">\n    <h1 align=\"center\">Vertex Pipelines Deployer</h1>\n    <p align=\"center\">\n        <a href=\"https://www.artefact.com/\">\n        <img src=\"docs/assets/logo.svg\" style=\"max-width:50%;height:auto;background-color:#111146;\" alt=\"Artefact Skaff Logo\"/>\n        </a>\n    </p>\n    <h3 align=\"center\">Deploy Vertex Pipelines within minutes</h3>\n        <p align=\"center\">\n        This tool is a wrapper around <a href=\"https://www.kubeflow.org/docs/components/pipelines/v2/hello-world/\">kfp</a> and <a href=\"https://cloud.google.com/python/docs/reference/aiplatform/latest\">google-cloud-aiplatform</a> that allows you to check, compile, upload, run, and schedule Vertex Pipelines in a standardized manner.\n        </p>\n</div>\n<br />\n\n<!-- PROJECT SHIELDS -->\n<div align=\"center\">\n\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/vertex-deployer?logo=python)\n![PyPI - Status](https://img.shields.io/pypi/v/vertex-deployer)\n![PyPI - Downloads](https://img.shields.io/pypi/dm/vertex-deployer?color=blue)\n![PyPI - License](https://img.shields.io/pypi/l/vertex-deployer)\n\n[![CI](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/ci.yaml/badge.svg?branch=main&event=push)](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/ci.yaml)\n[![Release](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/release.yaml/badge.svg?branch=main&event=push)](https://github.com/artefactory/vertex-pipelines-deployer/actions/workflows/release.yaml)\n\n[![Pre-commit](https://img.shields.io/badge/pre--commit-enabled-informational?logo=pre-commit&logoColor=white)](https://github.com/ornikar/vertex-eduscore/blob/develop/.pre-commit-config.yaml)\n[![Linting: ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat)](https://pycqa.github.io/isort/)\n\n</div>\n\n\n<details>\n  <summary>\ud83d\udcda Table of Contents</summary>\n  <ol>\n    <li><a href=\"#-why-this-tool\">Why this tool?</a></li>\n    <li><a href=\"#-prerequisites\">Prerequisites</a></li>\n    <li><a href=\"#-installation\">Installation</a></li>\n        <ol>\n            <li><a href=\"#from-git-repo\">From git repo</a></li>\n            <li><a href=\"#from-artifact-registry-not-available-in-pypi-yet\">From Artifact Registry (not available in PyPI yet)</a></li>\n            <li><a href=\"#add-to-requirements\">Add to requirements</a></li>\n        </ol>\n    <li><a href=\"#-usage\">Usage</a></li>\n        <ol>\n            <li><a href=\"#-setup\">Setup</a></li>\n            <li><a href=\"#-folder-structure\">Folder Structure</a></li>\n            <li><a href=\"#-cli-deploying-a-pipeline-with-deploy\">CLI: Deploying a Pipeline with `deploy`</a></li>\n            <li><a href=\"#-cli-checking-pipelines-are-valid-with-check\">CLI: Checking Pipelines are valid with `check`</a></li>\n            <li><a href=\"#-cli-other-commands\">CLI: Other commands</a></li>\n                <ol>\n                    <li><a href=\"#config\">`config`</a></li>\n                    <li><a href=\"#create\">`create`</a></li>\n                    <li><a href=\"#init\">`init`</a></li>\n                    <li><a href=\"#list\">`list`</a></li>\n                </ol>\n        </ol>\n    <li><a href=\"#cli-options\">CLI: Options</a></li>\n    <li><a href=\"#configuration\">Configuration</a></li>\n  </ol>\n</details>\n\n\n[Full CLI documentation](docs/CLI_REFERENCE.md)\n\n\n## \u2753 Why this tool?\n<!-- --8<-- [start:why] -->\n\nThree use cases:\n\n1. **CI:** Check pipeline validity.\n2. **Dev mode:** Quickly iterate over your pipelines by compiling and running them in multiple environments (test, dev, staging, etc.) without duplicating code or searching for the right kfp/aiplatform snippet.\n3. **CD:** Deploy your pipelines to Vertex Pipelines in a standardized manner in your CD with Cloud Build or GitHub Actions.\n\n\nTwo main commands:\n\n- `check`: Check your pipelines (imports, compile, check configs validity against pipeline definition).\n- `deploy`: Compile, upload to Artifact Registry, run, and schedule your pipelines.\n\n<!-- --8<-- [end:why] -->\n\n## \ud83d\udccb Prerequisites\n<!-- --8<-- [start:prerequisites] -->\n\n- Unix-like environment (Linux, macOS, WSL, etc.)\n- Python 3.8 to 3.10\n- Google Cloud SDK\n- A GCP project with Vertex Pipelines enabled\n<!-- --8<-- [end:prerequisites] -->\n\n## \ud83d\udce6 Installation\n<!-- --8<-- [start:installation] -->\n### From PyPI\n\n```bash\npip install vertex-deployer\n```\n\n### From git repo\n\nStable version:\n```bash\npip install git+https://github.com/artefactory/vertex-pipelines-deployer.git@main\n```\n\nDevelop version:\n```bash\npip install git+https://github.com/artefactory/vertex-pipelines-deployer.git@develop\n```\n\nIf you want to test this package on examples from this repo:\n```bash\ngit clone git@github.com:artefactory/vertex-pipelines-deployer.git\npoetry install\npoetry shell  # if you want to activate the virtual environment\ncd example\n```\n<!-- --8<-- [end:installation] -->\n\n## \ud83d\ude80 Usage\n<!-- --8<-- [start:setup] -->\n### \ud83d\udee0\ufe0f Setup\n\n1. Setup your GCP environment:\n```bash\nexport PROJECT_ID=<gcp_project_id>\ngcloud config set project $PROJECT_ID\ngcloud auth login\ngcloud auth application-default login\n```\n\n2. You need the following APIs to be enabled:\n- Cloud Build API\n- Artifact Registry API\n- Cloud Storage API\n- Vertex AI API\n```bash\ngcloud services enable \\\n    cloudbuild.googleapis.com \\\n    artifactregistry.googleapis.com \\\n    storage.googleapis.com \\\n    aiplatform.googleapis.com\n```\n\n3. Create an artifact registry repository for your base images (Docker format):\n```bash\nexport GAR_DOCKER_REPO_ID=<your_gar_repo_id_for_images>\nexport GAR_LOCATION=<your_gar_location>\ngcloud artifacts repositories create ${GAR_DOCKER_REPO_ID} \\\n    --location=${GAR_LOCATION} \\\n    --repository-format=docker\n```\n\n4. Build and upload your base images to the repository. To do so, please follow Google Cloud Build documentation.\n\n5. Create an artifact registry repository for your pipelines (KFP format):\n```bash\nexport GAR_PIPELINES_REPO_ID=<your_gar_repo_id_for_pipelines>\ngcloud artifacts repositories create ${GAR_PIPELINES_REPO_ID} \\\n    --location=${GAR_LOCATION} \\\n    --repository-format=kfp\n```\n\n6. Create a GCS bucket for Vertex Pipelines staging:\n```bash\nexport GCP_REGION=<your_gcp_region>\nexport VERTEX_STAGING_BUCKET_NAME=<your_bucket_name>\ngcloud storage buckets create gs://${VERTEX_STAGING_BUCKET_NAME} --location=${GCP_REGION}\n```\n\n7. Create a service account for Vertex Pipelines:\n```bash\nexport VERTEX_SERVICE_ACCOUNT_NAME=foobar\nexport VERTEX_SERVICE_ACCOUNT=\"${VERTEX_SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com\"\n\ngcloud iam service-accounts create ${VERTEX_SERVICE_ACCOUNT_NAME}\n\ngcloud projects add-iam-policy-binding ${PROJECT_ID} \\\n    --member=\"serviceAccount:${VERTEX_SERVICE_ACCOUNT}\" \\\n    --role=\"roles/aiplatform.user\"\n\ngcloud storage buckets add-iam-policy-binding gs://${VERTEX_STAGING_BUCKET_NAME} \\\n    --member=\"serviceAccount:${VERTEX_SERVICE_ACCOUNT}\" \\\n    --role=\"roles/storage.objectUser\"\n\ngcloud artifacts repositories add-iam-policy-binding ${GAR_PIPELINES_REPO_ID} \\\n   --location=${GAR_LOCATION} \\\n   --member=\"serviceAccount:${VERTEX_SERVICE_ACCOUNT}\" \\\n   --role=\"roles/artifactregistry.admin\"\n```\n<!-- --8<-- [end:setup] -->\nYou can use the deployer CLI (see example below) or import [`VertexPipelineDeployer`](deployer/pipeline_deployer.py) in your code (try it yourself).\n\n### \ud83d\udcc1 Folder Structure\n\n<!-- --8<-- [start:folder_structure] -->\nYou must respect the following folder structure. If you already follow the\n[Vertex Pipelines Starter Kit folder structure](https://github.com/artefactory/vertex-pipeline-starter-kit), it should be pretty smooth to use this tool:\n\n```\nvertex\n\u251c\u2500 configs/\n\u2502  \u2514\u2500 {pipeline_name}\n\u2502     \u2514\u2500 {config_name}.json\n\u2514\u2500 pipelines/\n   \u2514\u2500 {pipeline_name}.py\n```\n\n!!! tip \"About folder structure\"\n    You must have at least these files. If you need to share some config elements between pipelines,\n    you can have a `shared` folder in `configs` and import them in your pipeline configs.\n\n    If you're following a different folder structure, you can change the default paths in the `pyproject.toml` file.\n    See [Configuration](#configuration) section for more information.\n\n#### Pipelines\n\nYour file `{pipeline_name}.py` must contain a function called `{pipeline_name}` decorated using `kfp.dsl.pipeline`.\nIn previous versions, the functions / object used to be called `pipeline` but it was changed to `{pipeline_name}` to avoid confusion with the `kfp.dsl.pipeline` decorator.\n\n```python\n# vertex/pipelines/dummy_pipeline.py\nimport kfp.dsl\n\n# New name to avoid confusion with the kfp.dsl.pipeline decorator\n@kfp.dsl.pipeline()\ndef dummy_pipeline():\n    ...\n\n# Old name\n@kfp.dsl.pipeline()\ndef pipeline():\n    ...\n```\n\n#### Configs\n\nConfig file can be either `.py`, `.json`, `.toml` or `yaml` format.\nThey must be located in the `config/{pipeline_name}` folder.\n\n**Why multiple formats?**\n\n`.py` files are useful to define complex configs (e.g. a list of dicts) while `.json` / `.toml` / `yaml` files are useful to define simple configs (e.g. a string).\nIt also adds flexibility to the user and allows you to use the deployer with almost no migration cost.\n\n**How to format them?**\n\n- `.py` files must be valid python files with two important elements:\n\n    * `parameter_values` to pass arguments to your pipeline\n    * `input_artifacts` if you want to retrieve and create input artifacts to your pipeline.\n    See [Vertex Documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob) for more information.\n\n- `.json` files must be valid json files containing only one dict of key: value representing parameter values.\n- `.toml` files must be the same. Please note that TOML sections will be flattened, except for inline tables.\n    Section names will be joined using `\"_\"` separator and this is not configurable at the moment.\n    Example:\n\n    === \"TOML file\"\n        ```toml\n        [modeling]\n        model_name = \"my-model\"\n        params = { lambda = 0.1 }\n        ```\n\n    === \"Resulting parameter values\"\n        ```python\n        {\n            \"modeling_model_name\": \"my-model\",\n            \"modeling_params\": { \"lambda\": 0.1 }\n        }\n        ```\n\n- `.yaml` files must be valid yaml files containing only one dict of key: value representing parameter values.\n\n??? question \"Why are sections flattened when using TOML config files?\"\n    Vertex Pipelines parameter validation and parameter logging to Vertex Experiments are based on the parameter name.\n    If you do not flatten your sections, you'll only be able to validate section names and that they should be of type `dict`.\n\n    Not very useful.\n\n??? question \"Why aren't `input_artifacts` supported in TOML / JSON config files?\"\n    Because it's low on the priority list. Feel free to open a PR if you want to add it.\n\n\n**How to name them?**\n\n`{config_name}.py` or `{config_name}.json` or `{config_name}.toml`. config_name is free but must be unique for a given pipeline.\n\n\n#### Settings\n\nYou will also need the following ENV variables, either exported or in a `.env` file (see example in `example.env`):\n\n```bash\nPROJECT_ID=YOUR_PROJECT_ID  # GCP Project ID\nGCP_REGION=europe-west1  # GCP Region\n\nGAR_LOCATION=europe-west1  # Google Artifact Registry Location\nGAR_PIPELINES_REPO_ID=YOUR_GAR_KFP_REPO_ID  # Google Artifact Registry Repo ID (KFP format)\n\nVERTEX_STAGING_BUCKET_NAME=YOUR_VERTEX_STAGING_BUCKET_NAME  # GCS Bucket for Vertex Pipelines staging\nVERTEX_SERVICE_ACCOUNT=YOUR_VERTEX_SERVICE_ACCOUNT  # Vertex Pipelines Service Account\n```\n\n!!! note \"About env files\"\n    We're using env files and dotenv to load the environment variables.\n    No default value for `--env-file` argument is provided to ensure that you don't accidentally deploy to the wrong project.\n    An [`example.env`](./example/example.env) file is provided in this repo.\n    This also allows you to work with multiple environments thanks to env files (`test.env`, `dev.env`, `prod.env`, etc)\n<!-- --8<-- [end:folder_structure] -->\n\n<!-- --8<-- [start:usage] -->\n### \ud83d\ude80 CLI: Deploying a Pipeline with `deploy`\n\nLet's say you defined a pipeline in `dummy_pipeline.py` and a config file named `config_test.json`. You can deploy your pipeline using the following command:\n```bash\nvertex-deployer deploy dummy_pipeline \\\n    --compile \\\n    --upload \\\n    --run \\\n    --env-file example.env \\\n    --tags my-tag \\\n    --config-filepath vertex/configs/dummy_pipeline/config_test.json \\\n    --experiment-name my-experiment \\\n    --enable-caching \\\n    --skip-validation\n```\n\n### \u2705 CLI: Checking Pipelines are valid with `check`\n\nTo check that your pipelines are valid, you can use the `check` command. It uses a pydantic model to:\n- check that your pipeline imports and definition are valid\n- check that your pipeline can be compiled\n- check that all configs related to the pipeline are respecting the pipeline definition (using a Pydantic model based on pipeline signature)\n\nTo validate one or multiple pipeline(s):\n```bash\nvertex-deployer check dummy_pipeline <other pipeline name>\n```\n\nTo validate all pipelines in the `vertex/pipelines` folder:\n```bash\nvertex-deployer check --all\n```\n\n\n### \ud83d\udee0\ufe0f CLI: Other commands\n\n#### `config`\n\nYou can check your `vertex-deployer` configuration options using the `config` command.\nFields set in `pyproject.toml` will overwrite default values and will be displayed differently:\n```bash\nvertex-deployer config --all\n```\n\n#### `create`\n\nYou can create all files needed for a pipeline using the `create` command:\n```bash\nvertex-deployer create my_new_pipeline --config-type py\n```\n\nThis will create a `my_new_pipeline.py` file in the `vertex/pipelines` folder and a `vertex/config/my_new_pipeline/` folder with multiple config files in it.\n\n#### `init`\n\nTo initialize the deployer with default settings and folder structure, use the `init` command:\n```bash\nvertex-deployer init\n```\n\n```bash\n$ vertex-deployer init\nWelcome to Vertex Deployer!\nThis command will help you getting fired up.\nDo you want to configure the deployer? [y/n]: n\nDo you want to build default folder structure [y/n]: n\nDo you want to create a pipeline? [y/n]: n\nAll done \u2728\n```\n\n#### `list`\n\nYou can list all pipelines in the `vertex/pipelines` folder using the `list` command:\n```bash\nvertex-deployer list --with-configs\n```\n\n### \ud83c\udf6d CLI: Options\n\n```bash\nvertex-deployer --help\n```\n\nTo see package version:\n```bash\nvertex-deployer --version\n```\n\nTo adapt log level, use the `--log-level` option. Default is `INFO`.\n```bash\nvertex-deployer --log-level DEBUG deploy ...\n```\n\n<!-- --8<-- [end:usage] -->\n\n## Configuration\n\nYou can configure the deployer using the `pyproject.toml` file to better fit your needs.\nThis will overwrite default values. It can be useful if you always use the same options, e.g. always the same `--scheduler-timezone`\n\n```toml\n[tool.vertex-deployer]\nvertex_folder_path = \"my/path/to/vertex\"\nlog_level = \"INFO\"\n\n[tool.vertex-deployer.deploy]\nscheduler_timezone = \"Europe/Paris\"\n```\n\nYou can display all the configurable parameterss with default values by running:\n```bash\n$ vertex-deployer config --all\n'*' means the value was set in config file\n\n* vertex_folder_path=my/path/to/vertex\n* log_level=INFO\ndeploy\n  env_file=None\n  compile=True\n  upload=False\n  run=False\n  schedule=False\n  cron=None\n  delete_last_schedule=False\n  * scheduler_timezone=Europe/Paris\n  tags=['latest']\n  config_filepath=None\n  config_name=None\n  enable_caching=False\n  experiment_name=None\ncheck\n  all=False\n  config_filepath=None\n  raise_error=False\nlist\n  with_configs=True\ncreate\n  config_type=json\n```\n\n## Repository Structure\n\n```\n\u251c\u2500 .github\n\u2502  \u251c\u2500 ISSUE_TEMPLATE/\n\u2502  \u251c\u2500 workflows\n\u2502  \u2502  \u251c\u2500 ci.yaml\n\u2502  \u2502  \u251c\u2500 pr_agent.yaml\n\u2502  \u2502  \u2514\u2500 release.yaml\n\u2502  \u251c\u2500 CODEOWNERS\n\u2502  \u2514\u2500 PULL_REQUEST_TEMPLATE.md\n\u251c\u2500 deployer                                     # Source code\n\u2502  \u251c\u2500 __init__.py\n\u2502  \u251c\u2500 cli.py\n\u2502  \u251c\u2500 constants.py\n\u2502  \u251c\u2500 pipeline_checks.py\n\u2502  \u251c\u2500 pipeline_deployer.py\n\u2502  \u251c\u2500 settings.py\n\u2502  \u2514\u2500 utils\n\u2502     \u251c\u2500 config.py\n\u2502     \u251c\u2500 console.py\n\u2502     \u251c\u2500 exceptions.py\n\u2502     \u251c\u2500 logging.py\n\u2502     \u251c\u2500 models.py\n\u2502     \u2514\u2500 utils.py\n\u251c\u2500 docs/                                        # Documentation folder (mkdocs)\n\u251c\u2500 templates/                                   # Semantic Release templates\n\u251c\u2500 tests/\n\u251c\u2500 example                                      # Example folder with dummy pipeline and config\n|   \u251c\u2500 example.env\n\u2502   \u2514\u2500 vertex\n\u2502      \u251c\u2500 components\n\u2502      \u2502  \u2514\u2500 dummy.py\n\u2502      \u251c\u2500 configs\n\u2502      \u2502  \u251c\u2500 broken_pipeline\n\u2502      \u2502  \u2502  \u2514\u2500 config_test.json\n\u2502      \u2502  \u2514\u2500 dummy_pipeline\n\u2502      \u2502     \u251c\u2500 config_test.json\n\u2502      \u2502     \u251c\u2500 config.py\n\u2502      \u2502     \u2514\u2500 config.toml\n\u2502      \u251c\u2500 deployment\n\u2502      \u251c\u2500 lib\n\u2502      \u2514\u2500 pipelines\n\u2502         \u251c\u2500 broken_pipeline.py\n\u2502         \u2514\u2500 dummy_pipeline.py\n\u251c\u2500 .gitignore\n\u251c\u2500 .pre-commit-config.yaml\n\u251c\u2500 catalog-info.yaml                            # Roadie integration configuration\n\u251c\u2500 CHANGELOG.md\n\u251c\u2500 CONTRIBUTING.md\n\u251c\u2500 LICENSE\n\u251c\u2500 Makefile\n\u251c\u2500 mkdocs.yml                                   # Mkdocs configuration\n\u251c\u2500 pyproject.toml\n\u2514\u2500 README.md\n```\n\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "Check, compile, upload, run, and schedule Kubeflow Pipelines on GCP Vertex AI in a standardized manner.",
    "version": "0.5.4",
    "project_urls": {
        "Documentation": "https://artefactory.github.io/vertex-pipelines-deployer/",
        "Homepage": "https://github.com/artefactory/vertex-pipelines-deployer",
        "Repository": "https://github.com/artefactory/vertex-pipelines-deployer"
    },
    "split_keywords": [
        "kubeflow",
        " vertexai",
        " aiplatform",
        " gcp",
        " mlops",
        " deployer",
        " pipeline"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7889fdff3222288f6c7e50ba469ac26cbae27e8e51532fa661e1572e8ccf2c02",
                "md5": "42aabf341ce30279200fefb3c3295d22",
                "sha256": "922b08da9e797a1f4f67f0a4ca37ae5667e0af35915d0053be543962827807bd"
            },
            "downloads": -1,
            "filename": "vertex_deployer-0.5.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "42aabf341ce30279200fefb3c3295d22",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<3.13.0,>=3.8",
            "size": 42193,
            "upload_time": "2024-10-11T20:26:30",
            "upload_time_iso_8601": "2024-10-11T20:26:30.939777Z",
            "url": "https://files.pythonhosted.org/packages/78/89/fdff3222288f6c7e50ba469ac26cbae27e8e51532fa661e1572e8ccf2c02/vertex_deployer-0.5.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0d97bdda25246c00f1f2fc4e4d71324d8efd7f3f54438460341bbcc4979cbe9d",
                "md5": "2ee74df748cea1b2dc7acbddfaf6d706",
                "sha256": "198355af6c8f27f2340b86514c53d123eb541c5ef8a23a65b2105448cd8fcb30"
            },
            "downloads": -1,
            "filename": "vertex_deployer-0.5.4.tar.gz",
            "has_sig": false,
            "md5_digest": "2ee74df748cea1b2dc7acbddfaf6d706",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<3.13.0,>=3.8",
            "size": 40011,
            "upload_time": "2024-10-11T20:26:32",
            "upload_time_iso_8601": "2024-10-11T20:26:32.482175Z",
            "url": "https://files.pythonhosted.org/packages/0d/97/bdda25246c00f1f2fc4e4d71324d8efd7f3f54438460341bbcc4979cbe9d/vertex_deployer-0.5.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-11 20:26:32",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "artefactory",
    "github_project": "vertex-pipelines-deployer",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "vertex-deployer"
}
        
Elapsed time: 1.29323s