workfloz


Nameworkfloz JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryA simple library for building complex workflows.
upload_time2024-04-17 13:10:14
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseNone
keywords workflows pipelines
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Workfloz
A simple library for building complex workflows.
___

Workfloz is meant to be very easy to use, abstracting away most of the complexity one needs to deal with when building Workflows.
This is done through the use of extensions, where the complexity resides, and through a clean and easy to learn syntax.

## Installing
```shell
pip install workfloz
```

## Vision
Although Workfloz is built to be a general-purpose tool,
the first set of extensions will be about machine learning. Once stable, the library should be able to run the following code:
```python
# 1. Instantiate tools provided by extension
loader = CSVLoader("loader", file="data.csv") # Set as concrete directly.
processor = DataProcessor("processor")
processors = Pipeline("processors", processor.remove_duplicates())
builder = Abstract("builder") # Set as abstract and set concrete later.
trainer = ModelTrainer("trainer", auto_from=builder) # Automatically choose right trainer based on builder.
mlf_logger = MLFlowLogger("mlflogger", url="http://...")
file_logger = FileLogger("filelogger", dir="logs/")

# 2. Build workflow template
with Job("Machine Learning") as ML:

    with Task("prepare inputs", mode="async"): # 'async' applies on a line basis
        loader.load() | processors.run() > trainer.data
        builder.build() > trainer.model
    
    with Task("train", mode="async"):
        trainer.train()
        when("training_started", trainer) >> [mlf_logger.log_parameters(), file_logger.log_parameters()]
        when("epoch_ended", trainer) >> [mlf_logger.log_metrics(), file_logger.log_metrics()]
        when("training_ended", trainer) >> [mlf_logger.log_model(), file_logger.log_model()]
              
# 3. Define different Workflows from base template above.
forest10 = Job("forest-10", blueprint=ML)
# Set missing concrete strategies
forest10["builder"] = SKLForestBuilder(num_estimators=10)

forest50 = Job("forest-50", blueprint=ML)
forest50["builder"] = SKLForestBuilder(num_estimators=50)

forest50-scaled = Job("forest-50s", blueprint=forest50)
# Add processor to Pipeline
processors.then(processor.Scale())

# 4. Start workflows	
forest10.start()
forest50.start()
forest50s.start()
```
In pratice, 1 and 2 could be provided by the extension. The end user would only need to define 3 and 4.
Extensions for Scikit learn, HuggingFace and MLFlow are planned.

## Status of current version
The library is under active development but it will take some time before the example above can run. The API is not to be considered stable before v1.0.0 is released.  
The following example is already possible though (available in '/examples'):

```python
import pandas as pd

from workfloz import ActionContainer
from workfloz import Job
from workfloz import Parameter
from workfloz import result
from workfloz import StringValidator


# Define tool
class CSVLoader(ActionContainer):  # Every method becomes an 'Action'
    """Return a pandas DataFrame from a CSV file."""

    # Attributes can be validated and documented
    file: str = Parameter(
        doc="The relative or absolute path.", validators=[StringValidator(max_len=50)]
    )
    separator: str = Parameter(default=",")

    def load(
        self, file, separator
    ):  # arguments will be filled in from above if not specified in call.
        return pd.read_csv(file, sep=separator)


# Instantiate tool
loader = CSVLoader("loader", file="iris.csv")
assert loader.file == "iris.csv"  # Attribute file is set on loader

# Define workflow
with Job("load data") as job:
    # A call to an 'Action' is recorded and will be executed on 'start'
    data = loader.load()
    # data = loader.load(separator=";") # Attr. could be overriden, only for this call

# start Job and check result
job.start()
print(result(data))
```


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "workfloz",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "workflows, pipelines",
    "author": null,
    "author_email": "Ma\u00ebl Jamet <maeljamet@hotmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/68/ec/817c9d437d3c162be1f8cce9efaf8e0bc9570dca74ed193ff646421b0a60/workfloz-0.1.0.tar.gz",
    "platform": null,
    "description": "# Workfloz\nA simple library for building complex workflows.\n___\n\nWorkfloz is meant to be very easy to use, abstracting away most of the complexity one needs to deal with when building Workflows.\nThis is done through the use of extensions, where the complexity resides, and through a clean and easy to learn syntax.\n\n## Installing\n```shell\npip install workfloz\n```\n\n## Vision\nAlthough Workfloz is built to be a general-purpose tool,\nthe first set of extensions will be about machine learning. Once stable, the library should be able to run the following code:\n```python\n# 1. Instantiate tools provided by extension\nloader = CSVLoader(\"loader\", file=\"data.csv\") # Set as concrete directly.\nprocessor = DataProcessor(\"processor\")\nprocessors = Pipeline(\"processors\", processor.remove_duplicates())\nbuilder = Abstract(\"builder\") # Set as abstract and set concrete later.\ntrainer = ModelTrainer(\"trainer\", auto_from=builder) # Automatically choose right trainer based on builder.\nmlf_logger = MLFlowLogger(\"mlflogger\", url=\"http://...\")\nfile_logger = FileLogger(\"filelogger\", dir=\"logs/\")\n\n# 2. Build workflow template\nwith Job(\"Machine Learning\") as ML:\n\n    with Task(\"prepare inputs\", mode=\"async\"): # 'async' applies on a line basis\n        loader.load() | processors.run() > trainer.data\n        builder.build() > trainer.model\n    \n    with Task(\"train\", mode=\"async\"):\n        trainer.train()\n        when(\"training_started\", trainer) >> [mlf_logger.log_parameters(), file_logger.log_parameters()]\n        when(\"epoch_ended\", trainer) >> [mlf_logger.log_metrics(), file_logger.log_metrics()]\n        when(\"training_ended\", trainer) >> [mlf_logger.log_model(), file_logger.log_model()]\n              \n# 3. Define different Workflows from base template above.\nforest10 = Job(\"forest-10\", blueprint=ML)\n# Set missing concrete strategies\nforest10[\"builder\"] = SKLForestBuilder(num_estimators=10)\n\nforest50 = Job(\"forest-50\", blueprint=ML)\nforest50[\"builder\"] = SKLForestBuilder(num_estimators=50)\n\nforest50-scaled = Job(\"forest-50s\", blueprint=forest50)\n# Add processor to Pipeline\nprocessors.then(processor.Scale())\n\n# 4. Start workflows\t\nforest10.start()\nforest50.start()\nforest50s.start()\n```\nIn pratice, 1 and 2 could be provided by the extension. The end user would only need to define 3 and 4.\nExtensions for Scikit learn, HuggingFace and MLFlow are planned.\n\n## Status of current version\nThe library is under active development but it will take some time before the example above can run. The API is not to be considered stable before v1.0.0 is released.  \nThe following example is already possible though (available in '/examples'):\n\n```python\nimport pandas as pd\n\nfrom workfloz import ActionContainer\nfrom workfloz import Job\nfrom workfloz import Parameter\nfrom workfloz import result\nfrom workfloz import StringValidator\n\n\n# Define tool\nclass CSVLoader(ActionContainer):  # Every method becomes an 'Action'\n    \"\"\"Return a pandas DataFrame from a CSV file.\"\"\"\n\n    # Attributes can be validated and documented\n    file: str = Parameter(\n        doc=\"The relative or absolute path.\", validators=[StringValidator(max_len=50)]\n    )\n    separator: str = Parameter(default=\",\")\n\n    def load(\n        self, file, separator\n    ):  # arguments will be filled in from above if not specified in call.\n        return pd.read_csv(file, sep=separator)\n\n\n# Instantiate tool\nloader = CSVLoader(\"loader\", file=\"iris.csv\")\nassert loader.file == \"iris.csv\"  # Attribute file is set on loader\n\n# Define workflow\nwith Job(\"load data\") as job:\n    # A call to an 'Action' is recorded and will be executed on 'start'\n    data = loader.load()\n    # data = loader.load(separator=\";\") # Attr. could be overriden, only for this call\n\n# start Job and check result\njob.start()\nprint(result(data))\n```\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A simple library for building complex workflows.",
    "version": "0.1.0",
    "project_urls": {
        "Source Code": "https://github.com/maejam/workfloz"
    },
    "split_keywords": [
        "workflows",
        " pipelines"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "633b46e31c9c9af59d5c289af312af21642d0709540faf54e21ef3d7f797fd4d",
                "md5": "45d4ef3e8e4b323a7c13de80da0aa855",
                "sha256": "1882169d733fb189502bd14dc5f1c1cdb4a8889b74d84c25dba7478f11ee69e9"
            },
            "downloads": -1,
            "filename": "workfloz-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "45d4ef3e8e4b323a7c13de80da0aa855",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 22800,
            "upload_time": "2024-04-17T13:10:12",
            "upload_time_iso_8601": "2024-04-17T13:10:12.997349Z",
            "url": "https://files.pythonhosted.org/packages/63/3b/46e31c9c9af59d5c289af312af21642d0709540faf54e21ef3d7f797fd4d/workfloz-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "68ec817c9d437d3c162be1f8cce9efaf8e0bc9570dca74ed193ff646421b0a60",
                "md5": "779d3c9005b4b0d6af02d39373911ea3",
                "sha256": "e09321f0c2fe0dd652c73bba0298264c5b544b8362b6d0765b66efc72a651160"
            },
            "downloads": -1,
            "filename": "workfloz-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "779d3c9005b4b0d6af02d39373911ea3",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 28230,
            "upload_time": "2024-04-17T13:10:14",
            "upload_time_iso_8601": "2024-04-17T13:10:14.830319Z",
            "url": "https://files.pythonhosted.org/packages/68/ec/817c9d437d3c162be1f8cce9efaf8e0bc9570dca74ed193ff646421b0a60/workfloz-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-17 13:10:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "maejam",
    "github_project": "workfloz",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "tox": true,
    "lcname": "workfloz"
}
        
Elapsed time: 0.23020s