# data-pipelines-cli
[![Python Version](https://img.shields.io/badge/python-3.9%20%7C%203.10-blue.svg)](https://github.com/getindata/data-pipelines-cli)
[![PyPI Version](https://badge.fury.io/py/data-pipelines-cli.svg)](https://pypi.org/project/data-pipelines-cli/)
[![Downloads](https://pepy.tech/badge/data-pipelines-cli)](https://pepy.tech/project/data-pipelines-cli)
[![Maintainability](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/maintainability)](https://codeclimate.com/github/getindata/data-pipelines-cli/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/test_coverage)](https://codeclimate.com/github/getindata/data-pipelines-cli/test_coverage)
[![Documentation Status](https://readthedocs.org/projects/data-pipelines-cli/badge/?version=latest)](https://data-pipelines-cli.readthedocs.io/en/latest/?badge=latest)
CLI for data platform
## Documentation
Read the full documentation at [https://data-pipelines-cli.readthedocs.io/](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)
## Installation
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install [dp (data-pipelines-cli)](https://pypi.org/project/data-pipelines-cli/):
```bash
pip install data-pipelines-cli[bigquery,docker,datahub,gcs]
```
## Usage
First, create a repository with a global configuration file that you or your organization will be using. The repository
should contain `dp.yml.tmpl` file looking similar to this:
```yaml
_templates_suffix: ".tmpl"
_envops:
autoescape: false
block_end_string: "%]"
block_start_string: "[%"
comment_end_string: "#]"
comment_start_string: "[#"
keep_trailing_newline: true
variable_end_string: "]]"
variable_start_string: "[["
templates:
my-first-template:
template_name: my-first-template
template_path: https://github.com/<YOUR_USERNAME>/<YOUR_TEMPLATE>.git
vars:
username: [[ YOUR_USERNAME ]]
```
Thanks to the [copier](https://copier.readthedocs.io/en/stable/), you can leverage tmpl template syntax to create
easily modifiable configuration templates. Just create a `copier.yml` file next to the `dp.yml.tmpl` one and configure
the template questions (read more at [copier documentation](https://copier.readthedocs.io/en/stable/configuring/)).
Then, run `dp init <CONFIG_REPOSITORY_URL>` to initialize **dp**. You can also drop `<CONFIG_REPOSITORY_URL>` argument,
**dp** will get initialized with an empty config.
### Project creation
You can use `dp create <NEW_PROJECT_PATH>` to choose one of the templates added before and create the project in the
`<NEW_PROJECT_PATH>` directory. You can also use `dp create <NEW_PROJECT_PATH> <LINK_TO_TEMPLATE_REPOSITORY>` to point
directly to a template repository. If `<LINK_TO_TEMPLATE_REPOSITORY>` proves to be the name of the template defined in
**dp**'s config file, `dp create` will choose the template by the name instead of trying to download the repository.
`dp template-list` lists all added templates.
### Project update
To update your pipeline project use `dp update <PIPELINE_PROJECT-PATH>`. It will sync your existing project with updated
template version selected by `--vcs-ref` option (default `HEAD`).
### Project deployment
`dp deploy` will sync with your bucket provider. The provider will be chosen automatically based on the remote URL.
Usually, it is worth pointing `dp deploy` to JSON or YAML file with provider-specific data like access tokens or project
names. E.g., to connect with Google Cloud Storage, one should run:
```bash
echo '{"token": "<PATH_TO_YOUR_TOKEN>", "project_name": "<YOUR_PROJECT_NAME>"}' > gs_args.json
dp deploy --dags-path "gs://<YOUR_GS_PATH>" --blob-args gs_args.json
```
However, in some cases you do not need to do so, e.g. when using `gcloud` with properly set local credentials. In such
case, you can try to run just the `dp deploy --dags-path "gs://<YOUR_GS_PATH>"` command. Please refer to
[documentation](https://data-pipelines-cli.readthedocs.io/en/latest/usage.html#project-deployment) for more information.
When finished, call `dp clean` to remove compilation related directories.
### Variables
You can put a dictionary of variables to be passed to `dbt` in your `config/<ENV>/dbt.yml` file, following the convention
presented in [the guide at the dbt site](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables#defining-variables-in-dbt_projectyml).
E.g., if one of the fields of `config/<SNOWFLAKE_ENV>/snowflake.yml` looks like this:
```yaml
schema: "{{ var('snowflake_schema') }}"
```
you should put the following in your `config/<SNOWFLAKE_ENV>/dbt.yml` file:
```yaml
vars:
snowflake_schema: EXAMPLE_SCHEMA
```
and then run your `dp run --env <SNOWFLAKE_ENV>` (or any similar command).
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Raw data
{
"_id": null,
"home_page": "https://github.com/getindata/data-pipelines-cli/",
"name": "data-pipelines-cli",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": "",
"keywords": "dbt airflow cli",
"author": "Andrzej Swatowski",
"author_email": "andrzej.swatowski@getindata.com",
"download_url": "https://files.pythonhosted.org/packages/cd/fb/e6e3ad3ed399f6ff52a170d26c453a1b318e1279d41309bebfcec14a74b5/data_pipelines_cli-0.30.0.tar.gz",
"platform": null,
"description": "# data-pipelines-cli\n\n[![Python Version](https://img.shields.io/badge/python-3.9%20%7C%203.10-blue.svg)](https://github.com/getindata/data-pipelines-cli)\n[![PyPI Version](https://badge.fury.io/py/data-pipelines-cli.svg)](https://pypi.org/project/data-pipelines-cli/)\n[![Downloads](https://pepy.tech/badge/data-pipelines-cli)](https://pepy.tech/project/data-pipelines-cli)\n[![Maintainability](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/maintainability)](https://codeclimate.com/github/getindata/data-pipelines-cli/maintainability)\n[![Test Coverage](https://api.codeclimate.com/v1/badges/e44ed9383a42b59984f6/test_coverage)](https://codeclimate.com/github/getindata/data-pipelines-cli/test_coverage)\n[![Documentation Status](https://readthedocs.org/projects/data-pipelines-cli/badge/?version=latest)](https://data-pipelines-cli.readthedocs.io/en/latest/?badge=latest)\n\nCLI for data platform\n\n## Documentation\n\nRead the full documentation at [https://data-pipelines-cli.readthedocs.io/](https://data-pipelines-cli.readthedocs.io/en/latest/index.html)\n\n## Installation\nUse the package manager [pip](https://pip.pypa.io/en/stable/) to install [dp (data-pipelines-cli)](https://pypi.org/project/data-pipelines-cli/):\n\n```bash\npip install data-pipelines-cli[bigquery,docker,datahub,gcs]\n```\n\n## Usage\nFirst, create a repository with a global configuration file that you or your organization will be using. The repository\nshould contain `dp.yml.tmpl` file looking similar to this:\n```yaml\n_templates_suffix: \".tmpl\"\n_envops:\n autoescape: false\n block_end_string: \"%]\"\n block_start_string: \"[%\"\n comment_end_string: \"#]\"\n comment_start_string: \"[#\"\n keep_trailing_newline: true\n variable_end_string: \"]]\"\n variable_start_string: \"[[\"\n\ntemplates:\n my-first-template:\n template_name: my-first-template\n template_path: https://github.com/<YOUR_USERNAME>/<YOUR_TEMPLATE>.git\n\nvars:\n username: [[ YOUR_USERNAME ]]\n```\nThanks to the [copier](https://copier.readthedocs.io/en/stable/), you can leverage tmpl template syntax to create\neasily modifiable configuration templates. Just create a `copier.yml` file next to the `dp.yml.tmpl` one and configure\nthe template questions (read more at [copier documentation](https://copier.readthedocs.io/en/stable/configuring/)).\n\nThen, run `dp init <CONFIG_REPOSITORY_URL>` to initialize **dp**. You can also drop `<CONFIG_REPOSITORY_URL>` argument,\n**dp** will get initialized with an empty config.\n\n### Project creation\n\nYou can use `dp create <NEW_PROJECT_PATH>` to choose one of the templates added before and create the project in the\n`<NEW_PROJECT_PATH>` directory. You can also use `dp create <NEW_PROJECT_PATH> <LINK_TO_TEMPLATE_REPOSITORY>` to point\ndirectly to a template repository. If `<LINK_TO_TEMPLATE_REPOSITORY>` proves to be the name of the template defined in\n**dp**'s config file, `dp create` will choose the template by the name instead of trying to download the repository.\n\n`dp template-list` lists all added templates.\n\n### Project update\n\nTo update your pipeline project use `dp update <PIPELINE_PROJECT-PATH>`. It will sync your existing project with updated\ntemplate version selected by `--vcs-ref` option (default `HEAD`).\n\n### Project deployment\n\n`dp deploy` will sync with your bucket provider. The provider will be chosen automatically based on the remote URL.\nUsually, it is worth pointing `dp deploy` to JSON or YAML file with provider-specific data like access tokens or project\nnames. E.g., to connect with Google Cloud Storage, one should run:\n```bash\necho '{\"token\": \"<PATH_TO_YOUR_TOKEN>\", \"project_name\": \"<YOUR_PROJECT_NAME>\"}' > gs_args.json\ndp deploy --dags-path \"gs://<YOUR_GS_PATH>\" --blob-args gs_args.json\n```\nHowever, in some cases you do not need to do so, e.g. when using `gcloud` with properly set local credentials. In such\ncase, you can try to run just the `dp deploy --dags-path \"gs://<YOUR_GS_PATH>\"` command. Please refer to\n[documentation](https://data-pipelines-cli.readthedocs.io/en/latest/usage.html#project-deployment) for more information.\n\nWhen finished, call `dp clean` to remove compilation related directories.\n\n### Variables\nYou can put a dictionary of variables to be passed to `dbt` in your `config/<ENV>/dbt.yml` file, following the convention\npresented in [the guide at the dbt site](https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables#defining-variables-in-dbt_projectyml).\nE.g., if one of the fields of `config/<SNOWFLAKE_ENV>/snowflake.yml` looks like this:\n```yaml\nschema: \"{{ var('snowflake_schema') }}\"\n```\nyou should put the following in your `config/<SNOWFLAKE_ENV>/dbt.yml` file:\n```yaml\nvars:\n snowflake_schema: EXAMPLE_SCHEMA\n```\nand then run your `dp run --env <SNOWFLAKE_ENV>` (or any similar command).\n\n## Contributing\nPull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.\n\nPlease make sure to update tests as appropriate.",
"bugtrack_url": null,
"license": "Apache Software License (Apache 2.0)",
"summary": "CLI for data platform",
"version": "0.30.0",
"project_urls": {
"Homepage": "https://github.com/getindata/data-pipelines-cli/"
},
"split_keywords": [
"dbt",
"airflow",
"cli"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "cdfbe6e3ad3ed399f6ff52a170d26c453a1b318e1279d41309bebfcec14a74b5",
"md5": "756a26806d82e25917b6c25d06c06b10",
"sha256": "e4ea7780a063aea38b5cf2acfeacdea77590d429982fc4df6bd8b7a2b426dd59"
},
"downloads": -1,
"filename": "data_pipelines_cli-0.30.0.tar.gz",
"has_sig": false,
"md5_digest": "756a26806d82e25917b6c25d06c06b10",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 44108,
"upload_time": "2023-12-08T18:25:18",
"upload_time_iso_8601": "2023-12-08T18:25:18.043667Z",
"url": "https://files.pythonhosted.org/packages/cd/fb/e6e3ad3ed399f6ff52a170d26c453a1b318e1279d41309bebfcec14a74b5/data_pipelines_cli-0.30.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-12-08 18:25:18",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "getindata",
"github_project": "data-pipelines-cli",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"tox": true,
"lcname": "data-pipelines-cli"
}