sf-hamilton-contrib


Namesf-hamilton-contrib JSON
Version 0.0.8 PyPI version JSON
download
home_pagehttps://github.com/dagworks-inc/hamilton/contrib
SummaryHamilton's user contributed shared dataflow library.
upload_time2024-03-11 22:33:13
maintainer
docs_urlNone
authorStefan Krawczyk, Elijah ben Izzy
requires_python>=3.8, <4
license
keywords hamilton collaborative shared dataflow library contrib
VCS
bugtrack_url
requirements numpy pandas typing_extensions typing_inspect
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ## Off-the-shelf Hamilton Dataflows

Here you'll find documentation about the sf-hamilton-contrib package that curates a collection of Hamilton Dataflows that are
ready to be used in your own projects. They are user-contributed and maintained, with
the goal of making it easier for you to get started with Hamilton.

We expect this collection to grow over time, so check back often! As dataflows become mature we
will move them into the official sub-package of the respository and become maintained by the
Hamilton team.

### Usage
There are two ways to get access to dataflows in this package. For either approach,
the assumption is that you have the requisite python dependencies installed on your system.
You'll get import errors if you don't. Don't know what you need, we have convenience functions to help!

#### Static installation
This approach relies on you installing the package on your system. This is the recommended path for
production purposes as you can version-lock your dependencies.

To install the package, run:

```bash
pip install sf-hamilton-contrib --upgrade
```

Once installed, you can import the dataflows as follows.

Things you need to know:
1. Whether it's a user or official DAGWorks supported dataflow. If user, what the name of the user is.
2. The name of the dataflow.
```python
from hamilton import driver
# from hamilton.contrib.dagworks import NAME_OF_DATAFLOW
from hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW

dr = (
    driver.Builder()
    .with_config({})  # replace with configuration as appropriate
    .with_modules(NAME_OF_DATAFLOW)
    .build()
)
# execute the dataflow, specifying what you want back. Will return a dictionary.
result = dr.execute(
    [NAME_OF_DATAFLOW.FUNCTION_NAME, ...],  # this specifies what you want back
    inputs={...}  # pass in inputs as appropriate
)
```
To find an example [go to the hub](https://hub.dagworks.io/docs/).

#### Dynamic installation
Here we dynamically download the dataflow from the internet and execute it. This is useful for quickly
iterating in a notebook and pulling in just the dataflow you need.

```python
from hamilton import dataflows, driver

# downloads into ~/.hamilton/dataflows and loads the module -- WARNING: ensure you know what code you're importing!
# NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW") # if using official DAGWorks dataflow
NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW", "NAME_OF_USER")
dr = (
  driver.Builder()
  .with_config({})  # replace with configuration as appropriate
  .with_modules(NAME_OF_DATAFLOW)
  .build()
)
# execute the dataflow, specifying what you want back. Will return a dictionary.
result = dr.execute(
  [NAME_OF_DATAFLOW.FUNCTION_NAME, ...],  # this specifies what you want back
  inputs={...}  # pass in inputs as appropriate
)
```
To find an example [go to the hub](https://hub.dagworks.io/docs/).

#### Modification
Getting started is one thing, but then modifying to your needs is another. So we have a prescribed
flow to enable you to take a dataflow, and copy the code to a place of your choosing. This allows
you to easily modify the dataflow as you see fit.

Run this in a notebook or python script to copy the dataflow to a directory of your choosing.
```python
from hamilton import dataflows

# dynamically pull and then copy
NAME_OF_DATAFLOW = dataflows.import_module("NAME_OF_DATAFLOW", "NAME_OF_USER")
dataflows.copy(NAME_OF_DATAFLOW, destination_path="PATH_TO_DIRECTORY")
# copy from the installed library
from hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW
dataflows.copy(NAME_OF_DATAFLOW, destination_path="PATH_TO_DIRECTORY")
```
You can then modify/import the code as you see fit. See [copy()](https://hamilton.dagworks.io/en/latest/reference/dataflows/copy/)
for more details.


### How to contribute

If you have a dataflow that you would like to share with the community, please submit a pull request
to this repository. We will review your dataflow and if it meets our standards we will add it to the
package. To submit a pull request please use [this template](https://github.com/DAGWorks-Inc/hamilton/blob/main/.github/PULL_REQUEST_TEMPLATE/HAMILTON_CONTRIB_PR_TEMPLATE.md)
. To access it, create a new Pull Request, then hit the `preview` tab, and click the link to append `template=HAMILTON_CONTRIB_PR_TEMPLATE.md` to the URL.


#### Dataflow standards
We want to ensure that the dataflows in this package are of high quality and are easy to use. To that end,
we have a set of standards that we expect all dataflows to meet. If you have any questions, please reach out.

Standards:
- The dataflow must be a valid Python module.
- It must not do anything malicious.
- It must be well documented.
- It must work.
- It must follow our standard structure as outlined below.

#### Getting started with development

To get started with development, you'll want to first fork the hamilton repository from the github UI.

Then, clone it locally and install the package in editable mode, ensuring you install any dependencies required for the initilization script
```bash
cd hamilton # Your fork
pip install -e "./contrib[contribute]" # Note that this package lives under the `contrib` folder
```

Next, you need to initialize your dataflow. This will create the necessary files and directories for you to get started.
```bash
init-dataflow -u <your_github_username> -n <name_of_dataflow>
```

This will do the following:

1. Create a package under `contrib/hamilton/contrib/user/<your_github_username>` with the appropriate files to describe you
   -  `author.md` -- this will describe you with links out to github/socials
   - `__init__.py` -- this will be an empty file that allows you to import your dataflow
2. Create a package under `contrib/hamilton/contrib/user/<your_github_username>/<name_of_dataflow>` with the appropriate files to describe your dataflow:
   - `README.md` to describe the dataflow with the standard headings
   - `__init__.py` to contain the Hamilton code
   - `requirements.txt` to contain the required packages outside of Hamilton
   - `tags.json` to curate your dataflow
   - `valid_configs.jsonl` to specify the valid configurations for it to be run
   - `dag.png` to show one possible configuration of your dataflow
3. Add all the above files to git!

These are all required. You do not have to use the initialization script -- you can always copy the files over directly. That said, it is idempotent (it will fill out any missing files),
and will ensure that you have the correct structure.

#### Developing your dataflow

To get started, you'll want to do the following:

- [ ] Fill out your `__init__.py` with the appropriate code -- see [this issue](https://github.com/DAGWorks-Inc/hamilton/issues/559) if you want some inspiration for where to get started
- [ ] Fill out the sections of your `README.md` with the appropriate documentation -- follow one of the approved dataflows
- [ ] Fill out your `tags.json` with the appropriate tags -- follow one of the approved dataflows
- [ ] Fill out your `valid_configs.jsonl` with the appropriate configurations -- this is not necessary if you have no configurations that can change the shape of your DAG
- [ ] Generate a visual representation of your DAG -- you can use the following `if __name__ == '__main__'` block to do so:
```python
import __init__ as my_module

from hamilton import base, driver

dr = driver.Driver(
    {},
    my_module,
    adapter=base.DefaultAdapter(),
)
# create the DAG image
dr.display_all_functions("dag", {"format": "png", "view": False})
```
- [ ] Push a branch back to your fork
- [ ] Open up a pull request to the main Hamilton repo!
  - [ ] Commit the files we just added
  - [ ] Create a PR
  - [ ] Tag one of the maintainers [elijahbenizzy](https://github.com/elijahbenizzy), [skrawcz](https://github.com/skrawcz), or [zilto](https://github.com/zilto) for a review
  - [ ] Ping us on [slack](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg) if you don't hear back within a few days

#### Username Management

As usernames map to packages, we need to ensure that they are valid. To that end, we have a few rules:
  - [ ] If your username contains hyphens, replace them with underscores.
  - [ ] If your username starts with a number, prefix it with an underscore.
  - [ ] If your author name is a python reserved keyword. Reach out to the maintainers for help.

If the above apply, run the `init-dataflow` command with `-s` to specify a sanitized username.

## Got questions?
Join our [slack](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg) community to chat/ask Qs/etc.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/dagworks-inc/hamilton/contrib",
    "name": "sf-hamilton-contrib",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.8, <4",
    "maintainer_email": "",
    "keywords": "hamilton,collaborative,shared,dataflow,library,contrib",
    "author": "Stefan Krawczyk, Elijah ben Izzy",
    "author_email": "stefan@dagworks.io,elijah@dagworks.io",
    "download_url": "https://files.pythonhosted.org/packages/d0/2e/7c6ffe3c739ed27524624ca03dd0c12ab16dd8997ec6c665fd3440078572/sf-hamilton-contrib-0.0.8.tar.gz",
    "platform": null,
    "description": "## Off-the-shelf Hamilton Dataflows\n\nHere you'll find documentation about the sf-hamilton-contrib package that curates a collection of Hamilton Dataflows that are\nready to be used in your own projects. They are user-contributed and maintained, with\nthe goal of making it easier for you to get started with Hamilton.\n\nWe expect this collection to grow over time, so check back often! As dataflows become mature we\nwill move them into the official sub-package of the respository and become maintained by the\nHamilton team.\n\n### Usage\nThere are two ways to get access to dataflows in this package. For either approach,\nthe assumption is that you have the requisite python dependencies installed on your system.\nYou'll get import errors if you don't. Don't know what you need, we have convenience functions to help!\n\n#### Static installation\nThis approach relies on you installing the package on your system. This is the recommended path for\nproduction purposes as you can version-lock your dependencies.\n\nTo install the package, run:\n\n```bash\npip install sf-hamilton-contrib --upgrade\n```\n\nOnce installed, you can import the dataflows as follows.\n\nThings you need to know:\n1. Whether it's a user or official DAGWorks supported dataflow. If user, what the name of the user is.\n2. The name of the dataflow.\n```python\nfrom hamilton import driver\n# from hamilton.contrib.dagworks import NAME_OF_DATAFLOW\nfrom hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW\n\ndr = (\n    driver.Builder()\n    .with_config({})  # replace with configuration as appropriate\n    .with_modules(NAME_OF_DATAFLOW)\n    .build()\n)\n# execute the dataflow, specifying what you want back. Will return a dictionary.\nresult = dr.execute(\n    [NAME_OF_DATAFLOW.FUNCTION_NAME, ...],  # this specifies what you want back\n    inputs={...}  # pass in inputs as appropriate\n)\n```\nTo find an example [go to the hub](https://hub.dagworks.io/docs/).\n\n#### Dynamic installation\nHere we dynamically download the dataflow from the internet and execute it. This is useful for quickly\niterating in a notebook and pulling in just the dataflow you need.\n\n```python\nfrom hamilton import dataflows, driver\n\n# downloads into ~/.hamilton/dataflows and loads the module -- WARNING: ensure you know what code you're importing!\n# NAME_OF_DATAFLOW = dataflows.import_module(\"NAME_OF_DATAFLOW\") # if using official DAGWorks dataflow\nNAME_OF_DATAFLOW = dataflows.import_module(\"NAME_OF_DATAFLOW\", \"NAME_OF_USER\")\ndr = (\n  driver.Builder()\n  .with_config({})  # replace with configuration as appropriate\n  .with_modules(NAME_OF_DATAFLOW)\n  .build()\n)\n# execute the dataflow, specifying what you want back. Will return a dictionary.\nresult = dr.execute(\n  [NAME_OF_DATAFLOW.FUNCTION_NAME, ...],  # this specifies what you want back\n  inputs={...}  # pass in inputs as appropriate\n)\n```\nTo find an example [go to the hub](https://hub.dagworks.io/docs/).\n\n#### Modification\nGetting started is one thing, but then modifying to your needs is another. So we have a prescribed\nflow to enable you to take a dataflow, and copy the code to a place of your choosing. This allows\nyou to easily modify the dataflow as you see fit.\n\nRun this in a notebook or python script to copy the dataflow to a directory of your choosing.\n```python\nfrom hamilton import dataflows\n\n# dynamically pull and then copy\nNAME_OF_DATAFLOW = dataflows.import_module(\"NAME_OF_DATAFLOW\", \"NAME_OF_USER\")\ndataflows.copy(NAME_OF_DATAFLOW, destination_path=\"PATH_TO_DIRECTORY\")\n# copy from the installed library\nfrom hamilton.contrib.user.NAME_OF_USER import NAME_OF_DATAFLOW\ndataflows.copy(NAME_OF_DATAFLOW, destination_path=\"PATH_TO_DIRECTORY\")\n```\nYou can then modify/import the code as you see fit. See [copy()](https://hamilton.dagworks.io/en/latest/reference/dataflows/copy/)\nfor more details.\n\n\n### How to contribute\n\nIf you have a dataflow that you would like to share with the community, please submit a pull request\nto this repository. We will review your dataflow and if it meets our standards we will add it to the\npackage. To submit a pull request please use [this template](https://github.com/DAGWorks-Inc/hamilton/blob/main/.github/PULL_REQUEST_TEMPLATE/HAMILTON_CONTRIB_PR_TEMPLATE.md)\n. To access it, create a new Pull Request, then hit the `preview` tab, and click the link to append `template=HAMILTON_CONTRIB_PR_TEMPLATE.md` to the URL.\n\n\n#### Dataflow standards\nWe want to ensure that the dataflows in this package are of high quality and are easy to use. To that end,\nwe have a set of standards that we expect all dataflows to meet. If you have any questions, please reach out.\n\nStandards:\n- The dataflow must be a valid Python module.\n- It must not do anything malicious.\n- It must be well documented.\n- It must work.\n- It must follow our standard structure as outlined below.\n\n#### Getting started with development\n\nTo get started with development, you'll want to first fork the hamilton repository from the github UI.\n\nThen, clone it locally and install the package in editable mode, ensuring you install any dependencies required for the initilization script\n```bash\ncd hamilton # Your fork\npip install -e \"./contrib[contribute]\" # Note that this package lives under the `contrib` folder\n```\n\nNext, you need to initialize your dataflow. This will create the necessary files and directories for you to get started.\n```bash\ninit-dataflow -u <your_github_username> -n <name_of_dataflow>\n```\n\nThis will do the following:\n\n1. Create a package under `contrib/hamilton/contrib/user/<your_github_username>` with the appropriate files to describe you\n   -  `author.md` -- this will describe you with links out to github/socials\n   - `__init__.py` -- this will be an empty file that allows you to import your dataflow\n2. Create a package under `contrib/hamilton/contrib/user/<your_github_username>/<name_of_dataflow>` with the appropriate files to describe your dataflow:\n   - `README.md` to describe the dataflow with the standard headings\n   - `__init__.py` to contain the Hamilton code\n   - `requirements.txt` to contain the required packages outside of Hamilton\n   - `tags.json` to curate your dataflow\n   - `valid_configs.jsonl` to specify the valid configurations for it to be run\n   - `dag.png` to show one possible configuration of your dataflow\n3. Add all the above files to git!\n\nThese are all required. You do not have to use the initialization script -- you can always copy the files over directly. That said, it is idempotent (it will fill out any missing files),\nand will ensure that you have the correct structure.\n\n#### Developing your dataflow\n\nTo get started, you'll want to do the following:\n\n- [ ] Fill out your `__init__.py` with the appropriate code -- see [this issue](https://github.com/DAGWorks-Inc/hamilton/issues/559) if you want some inspiration for where to get started\n- [ ] Fill out the sections of your `README.md` with the appropriate documentation -- follow one of the approved dataflows\n- [ ] Fill out your `tags.json` with the appropriate tags -- follow one of the approved dataflows\n- [ ] Fill out your `valid_configs.jsonl` with the appropriate configurations -- this is not necessary if you have no configurations that can change the shape of your DAG\n- [ ] Generate a visual representation of your DAG -- you can use the following `if __name__ == '__main__'` block to do so:\n```python\nimport __init__ as my_module\n\nfrom hamilton import base, driver\n\ndr = driver.Driver(\n    {},\n    my_module,\n    adapter=base.DefaultAdapter(),\n)\n# create the DAG image\ndr.display_all_functions(\"dag\", {\"format\": \"png\", \"view\": False})\n```\n- [ ] Push a branch back to your fork\n- [ ] Open up a pull request to the main Hamilton repo!\n  - [ ] Commit the files we just added\n  - [ ] Create a PR\n  - [ ] Tag one of the maintainers [elijahbenizzy](https://github.com/elijahbenizzy), [skrawcz](https://github.com/skrawcz), or [zilto](https://github.com/zilto) for a review\n  - [ ] Ping us on [slack](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg) if you don't hear back within a few days\n\n#### Username Management\n\nAs usernames map to packages, we need to ensure that they are valid. To that end, we have a few rules:\n  - [ ] If your username contains hyphens, replace them with underscores.\n  - [ ] If your username starts with a number, prefix it with an underscore.\n  - [ ] If your author name is a python reserved keyword. Reach out to the maintainers for help.\n\nIf the above apply, run the `init-dataflow` command with `-s` to specify a sanitized username.\n\n## Got questions?\nJoin our [slack](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-1bjs72asx-wcUTgH7q7QX1igiQ5bbdcg) community to chat/ask Qs/etc.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Hamilton's user contributed shared dataflow library.",
    "version": "0.0.8",
    "project_urls": {
        "Bug Reports": "https://github.com/dagworks-inc/hamilton/issues",
        "Homepage": "https://github.com/dagworks-inc/hamilton/contrib",
        "Source": "https://github.com/dagworks-inc/hamilton/contrib"
    },
    "split_keywords": [
        "hamilton",
        "collaborative",
        "shared",
        "dataflow",
        "library",
        "contrib"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4369167fce88b33351c5094d0ffe7d190ecfb0837cd08212af7490bb00a14766",
                "md5": "a86d07b0cfc5be803ed40e9d77b0190d",
                "sha256": "d08a44c154a1e2b26ebd77c158f7d2e0cea40d09960b3bcd68a5d421169503c3"
            },
            "downloads": -1,
            "filename": "sf_hamilton_contrib-0.0.8-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a86d07b0cfc5be803ed40e9d77b0190d",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8, <4",
            "size": 70908,
            "upload_time": "2024-03-11T22:33:12",
            "upload_time_iso_8601": "2024-03-11T22:33:12.070378Z",
            "url": "https://files.pythonhosted.org/packages/43/69/167fce88b33351c5094d0ffe7d190ecfb0837cd08212af7490bb00a14766/sf_hamilton_contrib-0.0.8-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d02e7c6ffe3c739ed27524624ca03dd0c12ab16dd8997ec6c665fd3440078572",
                "md5": "43143b69ca4e7c42cd8c91901c86d93f",
                "sha256": "75c1e1edd010f8d2ee95d1714dcf511ddadee2e1788fb056d42ed10806e0585a"
            },
            "downloads": -1,
            "filename": "sf-hamilton-contrib-0.0.8.tar.gz",
            "has_sig": false,
            "md5_digest": "43143b69ca4e7c42cd8c91901c86d93f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8, <4",
            "size": 58566,
            "upload_time": "2024-03-11T22:33:13",
            "upload_time_iso_8601": "2024-03-11T22:33:13.822361Z",
            "url": "https://files.pythonhosted.org/packages/d0/2e/7c6ffe3c739ed27524624ca03dd0c12ab16dd8997ec6c665fd3440078572/sf-hamilton-contrib-0.0.8.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-11 22:33:13",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "dagworks-inc",
    "github_project": "hamilton",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "circle": true,
    "requirements": [
        {
            "name": "numpy",
            "specs": []
        },
        {
            "name": "pandas",
            "specs": []
        },
        {
            "name": "typing_extensions",
            "specs": [
                [
                    ">",
                    "4.0.0"
                ]
            ]
        },
        {
            "name": "typing_inspect",
            "specs": []
        }
    ],
    "lcname": "sf-hamilton-contrib"
}
        
Elapsed time: 0.22887s