apache-liminal


Nameapache-liminal JSON
Version 0.0.5 PyPI version JSON
download
home_pagehttps://github.com/apache/incubator-liminal
SummaryA package for authoring and deploying machine learning workflows
upload_time2023-01-25 11:55:02
maintainer
docs_urlNone
authordev@liminal.apache.org
requires_python>=3.6
licenseApache License, Version 2.0
keywords
VCS
bugtrack_url
requirements docker apache-airflow click Flask pyyaml boto3 botocore wheel termcolor docker-pycreds typing GitPython moto diskcache croniter pytz pytzdata freezegun statsd sqlalchemy flatdict jinja2 python-json-logger requests apache-airflow-providers-amazon apache-airflow-providers-cncf-kubernetes cfn-lint pre-commit Werkzeug itsdangerous MarkupSafe
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

# Apache Liminal

Apache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build,
train and deploy machine learning models in a robust and agile way.

The platform provides the abstractions and declarative capabilities for
data extraction & feature engineering followed by model training and serving.
Liminal's goal is to operationalize the machine learning process, allowing data scientists to
quickly transition from a successful experiment to an automated pipeline of model training,
validation, deployment and inference in production, freeing them from engineering and
non-functional tasks, and allowing them to focus on machine learning code and artifacts.

# Basics

Using simple YAML configuration, create your own schedule data pipelines (a sequence of tasks to
perform), application servers,  and more.

## Getting Started

A simple getting stated guide for Liminal can be found [here](docs/getting-started/hello_world.md)

## Apache Liminal Documentation

Full documentation of Apache Liminal can be found [here](docs/liminal)

## High Level Architecture

High level architecture documentation can be found [here](docs/architecture.md)

## Example YAML config file

```yaml
---
name: MyLiminalStack
owner: Bosco Albert Baracus
volumes:
  - volume: myvol1
    local:
      path: /Users/me/myvol1
images:
  - image: my_python_task_img
    type: python
    source: write_inputs
  - image: my_parallelized_python_task_img
    source: write_outputs
  - image: my_server_image
    type: python_server
    source: myserver
    endpoints:
      - endpoint: /myendpoint1
        module: my_server
        function: myendpoint1func
pipelines:
  - pipeline: my_pipeline
    start_date: 1970-01-01
    timeout_minutes: 45
    schedule: 0 * 1 * *
    metrics:
      namespace: TestNamespace
      backends: [ 'cloudwatch' ]
    tasks:
      - task: my_python_task
        type: python
        description: static input task
        image: my_python_task_img
        env_vars:
          NUM_FILES: 10
          NUM_SPLITS: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
      - task: my_parallelized_python_task
        type: python
        description: parallelized python task
        image: my_parallelized_python_task_img
        env_vars:
          FOO: BAR
        executors: 3
        mounts:
          - mount: mymount
            volume: myvol1
            path: /mnt/vol1
        cmd: python -u write_inputs.py
services:
  - service: my_python_server
    description: my python server
    image: my_server_image
```

# Installation

1. Install this repository (HEAD)

```bash
   pip install git+https://github.com/apache/incubator-liminal.git
```

2. Optional: set LIMINAL_HOME to path of your choice (if not set, will default to ~/liminal_home)

```bash
echo 'export LIMINAL_HOME=</path/to/some/folder>' >> ~/.bash_profile && source ~/.bash_profile
```

# Authoring pipelines

This involves at minimum creating a single file called liminal.yml as in the example above.

If your pipeline requires custom python code to implement tasks, they should be organized
[like this](https://github.com/apache/incubator-liminal/tree/master/tests/runners/airflow/liminal)

If your pipeline  introduces imports of external packages which are not already a part
of the liminal framework (i.e. you had to pip install them yourself), you need to also provide
a requirements.txt in the root of your project.

# Testing the pipeline locally

When your pipeline code is ready, you can test it by running it locally on your machine.

1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster:

  ![Kubernetes configured](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/k8s_running.png)

  And allocate it at least 3 CPUs (under "Resources" in the Docker preference UI).

  If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured using:

  ```bash
  kubectl config set-context <your remote kubernetes cluster>
  ```

2. Build the docker images used by your pipeline.

In the example pipeline above, you can see that tasks and services have an "image" field - such as
"my_static_input_task_image". This means that the task is executed inside a docker container, and the docker container
is created from a docker image where various code and libraries are installed.

You can take a look at what the build process looks like, e.g.
[here](https://github.com/apache/incubator-liminal/tree/master/liminal/build/image/python)

In order for the images to be available for your pipeline, you'll need to build them locally:

```bash
cd </path/to/your/liminal/code>
liminal build
```

You'll see that a number of outputs indicating various docker images built.

3. Create a kubernetes local volume \
In case your Yaml includes working with [volumes](https://github.com/apache/incubator-liminal/blob/6253f8b2c9dc244af032979ec6d462dc3e07e170/docs/getting_started.md#mounted-volumes)
please first run the following command:

```bash
cd </path/to/your/liminal/code>
liminal create
```

4. Deploy the pipeline:

```bash
cd </path/to/your/liminal/code>
liminal deploy
```

Note: after upgrading liminal, it's recommended to issue the command

```bash
liminal deploy --clean
```

This will rebuild the airlfow docker containers from scratch with a fresh version of liminal, ensuring consistency.

5. Start the server

```bash
liminal start
```

6. Stop the server

```bash
liminal stop
```

7. Display the server logs

```bash
liminal logs --follow/--tail

Number of lines to show from the end of the log:
liminal logs --tail=10

Follow log output:
liminal logs --follow
```

8. Navigate to [http://localhost:8080/admin](http://localhost:8080/admin)

9. You should see your ![pipeline](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/airflow.png)
The pipeline is scheduled to run according to the ```json schedule: 0 * 1 * *``` field in the .yml file you provided.

10. To manually activate your pipeline:

- Click your pipeline and then click "trigger DAG"
- Click "Graph view"
You should see the steps in your pipeline getting executed in "real time" by clicking "Refresh" periodically.

![Pipeline activation](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/airflow_trigger.png)

# Contributing

More information on contributing can be found [here](CONTRIBUTING.md)

# Community

The Liminal community holds a public call every Monday

- [Liminal Community Calendar](https://calendar.google.com/calendar/u/0/r?cid=jom1i20emghura6s6ookhe2skk@group.calendar.google.com)
- [Dev-Mailing-List](https://lists.apache.org/list.html?dev@liminal.apache.org)

## Running Tests (for contributors)

When doing local development and running Liminal unit-tests, make sure to set LIMINAL_STAND_ALONE_MODE=True

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/apache/incubator-liminal",
    "name": "apache-liminal",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.6",
    "maintainer_email": "",
    "keywords": "",
    "author": "dev@liminal.apache.org",
    "author_email": "dev@liminal.apache.org",
    "download_url": "https://files.pythonhosted.org/packages/ff/e6/933c3b9b5c09b711ad3fa2eb3fef1d0d4d1b7750020f7f2672bd5e6d164c/apache-liminal-0.0.5.tar.gz",
    "platform": null,
    "description": "<!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements.  See the NOTICE file\ndistributed with this work for additional information\nregarding copyright ownership.  The ASF licenses this file\nto you under the Apache License, Version 2.0 (the\n\"License\"); you may not use this file except in compliance\nwith the License.  You may obtain a copy of the License at\n\n  http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing,\nsoftware distributed under the License is distributed on an\n\"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\nKIND, either express or implied.  See the License for the\nspecific language governing permissions and limitations\nunder the License.\n-->\n\n# Apache Liminal\n\nApache Liminal is an end-to-end platform for data engineers & scientists, allowing them to build,\ntrain and deploy machine learning models in a robust and agile way.\n\nThe platform provides the abstractions and declarative capabilities for\ndata extraction & feature engineering followed by model training and serving.\nLiminal's goal is to operationalize the machine learning process, allowing data scientists to\nquickly transition from a successful experiment to an automated pipeline of model training,\nvalidation, deployment and inference in production, freeing them from engineering and\nnon-functional tasks, and allowing them to focus on machine learning code and artifacts.\n\n# Basics\n\nUsing simple YAML configuration, create your own schedule data pipelines (a sequence of tasks to\nperform), application servers,  and more.\n\n## Getting Started\n\nA simple getting stated guide for Liminal can be found [here](docs/getting-started/hello_world.md)\n\n## Apache Liminal Documentation\n\nFull documentation of Apache Liminal can be found [here](docs/liminal)\n\n## High Level Architecture\n\nHigh level architecture documentation can be found [here](docs/architecture.md)\n\n## Example YAML config file\n\n```yaml\n---\nname: MyLiminalStack\nowner: Bosco Albert Baracus\nvolumes:\n  - volume: myvol1\n    local:\n      path: /Users/me/myvol1\nimages:\n  - image: my_python_task_img\n    type: python\n    source: write_inputs\n  - image: my_parallelized_python_task_img\n    source: write_outputs\n  - image: my_server_image\n    type: python_server\n    source: myserver\n    endpoints:\n      - endpoint: /myendpoint1\n        module: my_server\n        function: myendpoint1func\npipelines:\n  - pipeline: my_pipeline\n    start_date: 1970-01-01\n    timeout_minutes: 45\n    schedule: 0 * 1 * *\n    metrics:\n      namespace: TestNamespace\n      backends: [ 'cloudwatch' ]\n    tasks:\n      - task: my_python_task\n        type: python\n        description: static input task\n        image: my_python_task_img\n        env_vars:\n          NUM_FILES: 10\n          NUM_SPLITS: 3\n        mounts:\n          - mount: mymount\n            volume: myvol1\n            path: /mnt/vol1\n        cmd: python -u write_inputs.py\n      - task: my_parallelized_python_task\n        type: python\n        description: parallelized python task\n        image: my_parallelized_python_task_img\n        env_vars:\n          FOO: BAR\n        executors: 3\n        mounts:\n          - mount: mymount\n            volume: myvol1\n            path: /mnt/vol1\n        cmd: python -u write_inputs.py\nservices:\n  - service: my_python_server\n    description: my python server\n    image: my_server_image\n```\n\n# Installation\n\n1. Install this repository (HEAD)\n\n```bash\n   pip install git+https://github.com/apache/incubator-liminal.git\n```\n\n2. Optional: set LIMINAL_HOME to path of your choice (if not set, will default to ~/liminal_home)\n\n```bash\necho 'export LIMINAL_HOME=</path/to/some/folder>' >> ~/.bash_profile && source ~/.bash_profile\n```\n\n# Authoring pipelines\n\nThis involves at minimum creating a single file called liminal.yml as in the example above.\n\nIf your pipeline requires custom python code to implement tasks, they should be organized\n[like this](https://github.com/apache/incubator-liminal/tree/master/tests/runners/airflow/liminal)\n\nIf your pipeline  introduces imports of external packages which are not already a part\nof the liminal framework (i.e. you had to pip install them yourself), you need to also provide\na requirements.txt in the root of your project.\n\n# Testing the pipeline locally\n\nWhen your pipeline code is ready, you can test it by running it locally on your machine.\n\n1. Ensure you have The Docker engine running locally, and enable a local Kubernetes cluster:\n\n  ![Kubernetes configured](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/k8s_running.png)\n\n  And allocate it at least 3 CPUs (under \"Resources\" in the Docker preference UI).\n\n  If you want to execute your pipeline on a remote kubernetes cluster, make sure the cluster is configured using:\n\n  ```bash\n  kubectl config set-context <your remote kubernetes cluster>\n  ```\n\n2. Build the docker images used by your pipeline.\n\nIn the example pipeline above, you can see that tasks and services have an \"image\" field - such as\n\"my_static_input_task_image\". This means that the task is executed inside a docker container, and the docker container\nis created from a docker image where various code and libraries are installed.\n\nYou can take a look at what the build process looks like, e.g.\n[here](https://github.com/apache/incubator-liminal/tree/master/liminal/build/image/python)\n\nIn order for the images to be available for your pipeline, you'll need to build them locally:\n\n```bash\ncd </path/to/your/liminal/code>\nliminal build\n```\n\nYou'll see that a number of outputs indicating various docker images built.\n\n3. Create a kubernetes local volume \\\nIn case your Yaml includes working with [volumes](https://github.com/apache/incubator-liminal/blob/6253f8b2c9dc244af032979ec6d462dc3e07e170/docs/getting_started.md#mounted-volumes)\nplease first run the following command:\n\n```bash\ncd </path/to/your/liminal/code>\nliminal create\n```\n\n4. Deploy the pipeline:\n\n```bash\ncd </path/to/your/liminal/code>\nliminal deploy\n```\n\nNote: after upgrading liminal, it's recommended to issue the command\n\n```bash\nliminal deploy --clean\n```\n\nThis will rebuild the airlfow docker containers from scratch with a fresh version of liminal, ensuring consistency.\n\n5. Start the server\n\n```bash\nliminal start\n```\n\n6. Stop the server\n\n```bash\nliminal stop\n```\n\n7. Display the server logs\n\n```bash\nliminal logs --follow/--tail\n\nNumber of lines to show from the end of the log:\nliminal logs --tail=10\n\nFollow log output:\nliminal logs --follow\n```\n\n8. Navigate to [http://localhost:8080/admin](http://localhost:8080/admin)\n\n9. You should see your ![pipeline](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/airflow.png)\nThe pipeline is scheduled to run according to the ```json schedule: 0 * 1 * *``` field in the .yml file you provided.\n\n10. To manually activate your pipeline:\n\n- Click your pipeline and then click \"trigger DAG\"\n- Click \"Graph view\"\nYou should see the steps in your pipeline getting executed in \"real time\" by clicking \"Refresh\" periodically.\n\n![Pipeline activation](https://raw.githubusercontent.com/apache/incubator-liminal/master/images/airflow_trigger.png)\n\n# Contributing\n\nMore information on contributing can be found [here](CONTRIBUTING.md)\n\n# Community\n\nThe Liminal community holds a public call every Monday\n\n- [Liminal Community Calendar](https://calendar.google.com/calendar/u/0/r?cid=jom1i20emghura6s6ookhe2skk@group.calendar.google.com)\n- [Dev-Mailing-List](https://lists.apache.org/list.html?dev@liminal.apache.org)\n\n## Running Tests (for contributors)\n\nWhen doing local development and running Liminal unit-tests, make sure to set LIMINAL_STAND_ALONE_MODE=True\n",
    "bugtrack_url": null,
    "license": "Apache License, Version 2.0",
    "summary": "A package for authoring and deploying machine learning workflows",
    "version": "0.0.5",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "99024c3ae92cf20c59eaa2bcae8b55c508710fee8ecc1cb8ba28106cb5302059",
                "md5": "5e01377d828c03439dad54c3c60e9b42",
                "sha256": "6db89479803b5ae68492ecabd0dc6b8357ca1192d7644e367ffac863bc1c3c7b"
            },
            "downloads": -1,
            "filename": "apache_liminal-0.0.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "5e01377d828c03439dad54c3c60e9b42",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.6",
            "size": 169702,
            "upload_time": "2023-01-25T11:55:00",
            "upload_time_iso_8601": "2023-01-25T11:55:00.242601Z",
            "url": "https://files.pythonhosted.org/packages/99/02/4c3ae92cf20c59eaa2bcae8b55c508710fee8ecc1cb8ba28106cb5302059/apache_liminal-0.0.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ffe6933c3b9b5c09b711ad3fa2eb3fef1d0d4d1b7750020f7f2672bd5e6d164c",
                "md5": "73fa5fdc7b2eea9b4c5802eed5c759fa",
                "sha256": "63612433fef766197cc9a7843b4bea1f5ddd78209524919f0f737895b109c8e0"
            },
            "downloads": -1,
            "filename": "apache-liminal-0.0.5.tar.gz",
            "has_sig": false,
            "md5_digest": "73fa5fdc7b2eea9b4c5802eed5c759fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.6",
            "size": 62647,
            "upload_time": "2023-01-25T11:55:02",
            "upload_time_iso_8601": "2023-01-25T11:55:02.126493Z",
            "url": "https://files.pythonhosted.org/packages/ff/e6/933c3b9b5c09b711ad3fa2eb3fef1d0d4d1b7750020f7f2672bd5e6d164c/apache-liminal-0.0.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-01-25 11:55:02",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "apache",
    "github_project": "incubator-liminal",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "docker",
            "specs": [
                [
                    "==",
                    "4.2.0"
                ]
            ]
        },
        {
            "name": "apache-airflow",
            "specs": [
                [
                    "==",
                    "2.1.2"
                ]
            ]
        },
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "7.1.1"
                ]
            ]
        },
        {
            "name": "Flask",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "pyyaml",
            "specs": [
                [
                    "==",
                    "5.4.1"
                ]
            ]
        },
        {
            "name": "boto3",
            "specs": [
                [
                    "==",
                    "1.17.112"
                ]
            ]
        },
        {
            "name": "botocore",
            "specs": [
                [
                    "==",
                    "1.20.112"
                ]
            ]
        },
        {
            "name": "wheel",
            "specs": [
                [
                    "==",
                    "0.36.2"
                ]
            ]
        },
        {
            "name": "termcolor",
            "specs": [
                [
                    "~=",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "docker-pycreds",
            "specs": [
                [
                    "==",
                    "0.4.0"
                ]
            ]
        },
        {
            "name": "typing",
            "specs": [
                [
                    "==",
                    "3.7.4.1"
                ]
            ]
        },
        {
            "name": "GitPython",
            "specs": [
                [
                    "==",
                    "3.1.11"
                ]
            ]
        },
        {
            "name": "moto",
            "specs": [
                [
                    "==",
                    "1.3.14"
                ]
            ]
        },
        {
            "name": "diskcache",
            "specs": [
                [
                    "==",
                    "3.1.1"
                ]
            ]
        },
        {
            "name": "croniter",
            "specs": [
                [
                    "==",
                    "0.3.31"
                ]
            ]
        },
        {
            "name": "pytz",
            "specs": [
                [
                    "==",
                    "2020.5"
                ]
            ]
        },
        {
            "name": "pytzdata",
            "specs": [
                [
                    "==",
                    "2020.1"
                ]
            ]
        },
        {
            "name": "freezegun",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "statsd",
            "specs": [
                [
                    ">=",
                    "3.3.0"
                ],
                [
                    "<",
                    "4.0"
                ]
            ]
        },
        {
            "name": "sqlalchemy",
            "specs": [
                [
                    "~=",
                    "1.3.15"
                ]
            ]
        },
        {
            "name": "flatdict",
            "specs": [
                [
                    "==",
                    "3.4.0"
                ]
            ]
        },
        {
            "name": "jinja2",
            "specs": [
                [
                    ">=",
                    "2.10.1"
                ],
                [
                    "<",
                    "2.11.0"
                ]
            ]
        },
        {
            "name": "python-json-logger",
            "specs": [
                [
                    "==",
                    "2.0.1"
                ]
            ]
        },
        {
            "name": "requests",
            "specs": [
                [
                    "==",
                    "2.26.0"
                ]
            ]
        },
        {
            "name": "apache-airflow-providers-amazon",
            "specs": [
                [
                    "==",
                    "1.4.0"
                ]
            ]
        },
        {
            "name": "apache-airflow-providers-cncf-kubernetes",
            "specs": [
                [
                    "==",
                    "1.0.2"
                ]
            ]
        },
        {
            "name": "cfn-lint",
            "specs": [
                [
                    "==",
                    "0.53.0"
                ]
            ]
        },
        {
            "name": "pre-commit",
            "specs": [
                [
                    "==",
                    "2.16.0"
                ]
            ]
        },
        {
            "name": "Werkzeug",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "itsdangerous",
            "specs": [
                [
                    "==",
                    "1.1.0"
                ]
            ]
        },
        {
            "name": "MarkupSafe",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        }
    ],
    "lcname": "apache-liminal"
}
        
Elapsed time: 0.03538s