pramen-py


Namepramen-py JSON
Version 1.8.5 PyPI version JSON
download
home_pagehttps://github.com/AbsaOSS/pramen
SummaryPramen transformations written in python
upload_time2024-04-27 07:16:05
maintainerArtem Zhukov
docs_urlNone
authorArtem Zhukov
requires_python<4.0,>=3.6.8
licenseNone
keywords paramen pyspark transformations metastore
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Pramen-py

Cli application for defining the data transformations for Pramen.

See:
```bash
pramen-py --help
```
for more information.


## Installation

### App settings

Application configuration solved by the environment variables
(see .env.example)

### Add pramen-py as a dependency to your project

In case of poetry:

```bash
# ensure we have valid poetry environment
ls pyproject.toml || poetry init

poetry add pramen-py
```
In case of pip:

```bash
pip install pramen-py
```


## Usage

## Application configuration

In order to configure the pramen-py options you need to set
corresponding environment variables. To see the list of available options run:

```bash
pramen-py list-configuration-options
```

### Developing transformations

pramen-py uses python's
[namespace packages](https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages)
for discovery of the transformations.

This mean, that in order to build a new transformer, it should be located
inside a python package with the `transformations` directory inside.

This directory should be declared as a package:
- for poetry
```toml
[tool.poetry]
# ...
packages = [
    { include = "transformations" },
]

```
- for setup.py
```python
from setuptools import setup, find_namespace_packages

setup(
    name='mynamespace-subpackage-a',
    # ...
    packages=find_namespace_packages(include=['transformations.*'])
)
```

Example files structure:
```
❯ tree .
.
├── README.md
├── poetry.lock
├── pyproject.toml
├── tests
│  └── test_identity_transformer.py
└── transformations
    └── identity_transformer
        ├── __init__.py
        └── example_config.yaml
```

In order to make transformer picked up by the pramen-py the following
conditions should be satisfied:
- python package containing the transformers should be installed to the
same python environment as pramen-py
- python package should have defined namespace package `transformations`
- transformers should extend `pramen_py.Transformation` base class

Subclasses created by extending Transformation base class are registered as
a cli command (pramen-py transformations run TransformationSubclassName)
with default options. Check:

```bash
pramen-py transformations run ExampleTransformation1 --help
```

for more details.

You can add your own cli options to your transformations. See example at
[ExampleTransformation2](transformations/example_trasformation_two/some_transformation.py)

### pramen-py pytest plugin

pramen-py also provides pytest plugin with helpful
fixtures to test created transformers.

List of available fixtures:
```bash
#install pramen-py into the environment and activate it
pytest --fixtures
# check under --- fixtures defined from pramen_py.test_utils.fixtures ---
```

pramen-py pytest plugin also loads environment variables from .env
file if it is presented in the root of the repo.

### Running and configuring transformations

Transformations can be run with the following command:
```bash
pramen-py transformations run \
  ExampleTransformation1 \
  --config config.yml \
  --info-date 2022-04-01
```

`--config` is required option for any transformation. See
[config_example.yaml](tests/resources/real_config.yaml) for more information.

To check available options and documentation for a particular transformation,
run:
```bash
pramen-py transformations run TransformationClassName --help
```
where TransformationClassName is the name of the transformation.

## Using as a Library
Read metastore tables by Pramen-Py API
```python
import datetime
from pyspark.sql import SparkSession
from pramen_py import MetastoreReader
from pramen_py.utils.file_system import FileSystemUtils

spark = SparkSession.getOrCreate()

hocon_config = FileSystemUtils(spark) \
    .load_hocon_config_from_hadoop("uri_or_path_to_file")

metastore = MetastoreReader(spark) \
    .from_config(hocon_config)

df_txn = metastore.get_table(
    "transactions",
    info_date_from=datetime.date(2022, 1, 1),
    info_date_to=datetime.date(2022, 6, 1)
)

df_customer = metastore.get_latest("customer")

df_txn.show(truncate=False)
df_customer.show(truncate=False)
```

## Development

Prerequisites:
- <https://python-poetry.org/docs/#installation>
- python 3.6

Setup steps:

```bash
git clone https://github.com/AbsaOSS/pramen
cd pramen-py
make install  # create virtualenv and install dependencies
make test
make pre-commit

# enable completions
# source <(pramen-py completions zsh)
# source <(pramen-py completions bash)

pramen-py --help
```


### Load environment configuration

Before doing any development step, you have to set your development
environment variables

```bash
make install
```

## Completions

```bash
# enable completions
source <(pramen-py completions zsh)
# or for bash
# source <(pramen-py completions bash)
```


## Deployment

### From the local development environment

```bash
# bump the version
vim pyproject.toml

# deploy to the dev environment (included steps of building and publishing
#   artefacts)
cat .env.ci
make publish
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/AbsaOSS/pramen",
    "name": "pramen-py",
    "maintainer": "Artem Zhukov",
    "docs_url": null,
    "requires_python": "<4.0,>=3.6.8",
    "maintainer_email": "iam@zhukovgreen.pro",
    "keywords": "paramen, pyspark, transformations, metastore",
    "author": "Artem Zhukov",
    "author_email": "iam@zhukovgreen.pro",
    "download_url": "https://files.pythonhosted.org/packages/74/92/af0aff78144e87d84065e7e3794073a1e5b4b2fc6403cd15e72e1ab9607d/pramen_py-1.8.5.tar.gz",
    "platform": null,
    "description": "# Pramen-py\n\nCli application for defining the data transformations for Pramen.\n\nSee:\n```bash\npramen-py --help\n```\nfor more information.\n\n\n## Installation\n\n### App settings\n\nApplication configuration solved by the environment variables\n(see .env.example)\n\n### Add pramen-py as a dependency to your project\n\nIn case of poetry:\n\n```bash\n# ensure we have valid poetry environment\nls pyproject.toml || poetry init\n\npoetry add pramen-py\n```\nIn case of pip:\n\n```bash\npip install pramen-py\n```\n\n\n## Usage\n\n## Application configuration\n\nIn order to configure the pramen-py options you need to set\ncorresponding environment variables. To see the list of available options run:\n\n```bash\npramen-py list-configuration-options\n```\n\n### Developing transformations\n\npramen-py uses python's\n[namespace packages](https://packaging.python.org/en/latest/guides/packaging-namespace-packages/#native-namespace-packages)\nfor discovery of the transformations.\n\nThis mean, that in order to build a new transformer, it should be located\ninside a python package with the `transformations` directory inside.\n\nThis directory should be declared as a package:\n- for poetry\n```toml\n[tool.poetry]\n# ...\npackages = [\n    { include = \"transformations\" },\n]\n\n```\n- for setup.py\n```python\nfrom setuptools import setup, find_namespace_packages\n\nsetup(\n    name='mynamespace-subpackage-a',\n    # ...\n    packages=find_namespace_packages(include=['transformations.*'])\n)\n```\n\nExample files structure:\n```\n\u276f tree .\n.\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 poetry.lock\n\u251c\u2500\u2500 pyproject.toml\n\u251c\u2500\u2500 tests\n\u2502  \u2514\u2500\u2500 test_identity_transformer.py\n\u2514\u2500\u2500 transformations\n    \u2514\u2500\u2500 identity_transformer\n        \u251c\u2500\u2500 __init__.py\n        \u2514\u2500\u2500 example_config.yaml\n```\n\nIn order to make transformer picked up by the pramen-py the following\nconditions should be satisfied:\n- python package containing the transformers should be installed to the\nsame python environment as pramen-py\n- python package should have defined namespace package `transformations`\n- transformers should extend `pramen_py.Transformation` base class\n\nSubclasses created by extending Transformation base class are registered as\na cli command (pramen-py transformations run TransformationSubclassName)\nwith default options. Check:\n\n```bash\npramen-py transformations run ExampleTransformation1 --help\n```\n\nfor more details.\n\nYou can add your own cli options to your transformations. See example at\n[ExampleTransformation2](transformations/example_trasformation_two/some_transformation.py)\n\n### pramen-py pytest plugin\n\npramen-py also provides pytest plugin with helpful\nfixtures to test created transformers.\n\nList of available fixtures:\n```bash\n#install pramen-py into the environment and activate it\npytest --fixtures\n# check under --- fixtures defined from pramen_py.test_utils.fixtures ---\n```\n\npramen-py pytest plugin also loads environment variables from .env\nfile if it is presented in the root of the repo.\n\n### Running and configuring transformations\n\nTransformations can be run with the following command:\n```bash\npramen-py transformations run \\\n  ExampleTransformation1 \\\n  --config config.yml \\\n  --info-date 2022-04-01\n```\n\n`--config` is required option for any transformation. See\n[config_example.yaml](tests/resources/real_config.yaml) for more information.\n\nTo check available options and documentation for a particular transformation,\nrun:\n```bash\npramen-py transformations run TransformationClassName --help\n```\nwhere TransformationClassName is the name of the transformation.\n\n## Using as a Library\nRead metastore tables by Pramen-Py API\n```python\nimport datetime\nfrom pyspark.sql import SparkSession\nfrom pramen_py import MetastoreReader\nfrom pramen_py.utils.file_system import FileSystemUtils\n\nspark = SparkSession.getOrCreate()\n\nhocon_config = FileSystemUtils(spark) \\\n    .load_hocon_config_from_hadoop(\"uri_or_path_to_file\")\n\nmetastore = MetastoreReader(spark) \\\n    .from_config(hocon_config)\n\ndf_txn = metastore.get_table(\n    \"transactions\",\n    info_date_from=datetime.date(2022, 1, 1),\n    info_date_to=datetime.date(2022, 6, 1)\n)\n\ndf_customer = metastore.get_latest(\"customer\")\n\ndf_txn.show(truncate=False)\ndf_customer.show(truncate=False)\n```\n\n## Development\n\nPrerequisites:\n- <https://python-poetry.org/docs/#installation>\n- python 3.6\n\nSetup steps:\n\n```bash\ngit clone https://github.com/AbsaOSS/pramen\ncd pramen-py\nmake install  # create virtualenv and install dependencies\nmake test\nmake pre-commit\n\n# enable completions\n# source <(pramen-py completions zsh)\n# source <(pramen-py completions bash)\n\npramen-py --help\n```\n\n\n### Load environment configuration\n\nBefore doing any development step, you have to set your development\nenvironment variables\n\n```bash\nmake install\n```\n\n## Completions\n\n```bash\n# enable completions\nsource <(pramen-py completions zsh)\n# or for bash\n# source <(pramen-py completions bash)\n```\n\n\n## Deployment\n\n### From the local development environment\n\n```bash\n# bump the version\nvim pyproject.toml\n\n# deploy to the dev environment (included steps of building and publishing\n#   artefacts)\ncat .env.ci\nmake publish\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Pramen transformations written in python",
    "version": "1.8.5",
    "project_urls": {
        "Homepage": "https://github.com/AbsaOSS/pramen",
        "Repository": "https://github.com/AbsaOSS/pramen"
    },
    "split_keywords": [
        "paramen",
        " pyspark",
        " transformations",
        " metastore"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5edc260bd28a3c9073e677d8f5885dbf89897d91a10250bc2045b262664afae2",
                "md5": "34bf77cbea31bd4c69225d2c466b6315",
                "sha256": "aa69061e18b2ec4ec476adf61f187de8dea28fb2100243af32c35acd08c4eec4"
            },
            "downloads": -1,
            "filename": "pramen_py-1.8.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "34bf77cbea31bd4c69225d2c466b6315",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.6.8",
            "size": 45693,
            "upload_time": "2024-04-27T07:15:59",
            "upload_time_iso_8601": "2024-04-27T07:15:59.363657Z",
            "url": "https://files.pythonhosted.org/packages/5e/dc/260bd28a3c9073e677d8f5885dbf89897d91a10250bc2045b262664afae2/pramen_py-1.8.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7492af0aff78144e87d84065e7e3794073a1e5b4b2fc6403cd15e72e1ab9607d",
                "md5": "472837f9b2781ab2600830152fab5f71",
                "sha256": "6224173dd4089a9d5a84d69189dd8b1af9d30e897e1c64bf234946022bf93cdb"
            },
            "downloads": -1,
            "filename": "pramen_py-1.8.5.tar.gz",
            "has_sig": false,
            "md5_digest": "472837f9b2781ab2600830152fab5f71",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.6.8",
            "size": 26796,
            "upload_time": "2024-04-27T07:16:05",
            "upload_time_iso_8601": "2024-04-27T07:16:05.200829Z",
            "url": "https://files.pythonhosted.org/packages/74/92/af0aff78144e87d84065e7e3794073a1e5b4b2fc6403cd15e72e1ab9607d/pramen_py-1.8.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-27 07:16:05",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "AbsaOSS",
    "github_project": "pramen",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pramen-py"
}
        
Elapsed time: 0.32157s