origami-ml


Nameorigami-ml JSON
Version 0.1.4 PyPI version JSON
download
home_pagehttps://github.com/mongodb-labs/origami
SummaryAn ML classifier model to make predictions from semi-structured data.
upload_time2025-02-12 01:43:45
maintainerNone
docs_urlNone
authorThomas Rueckstiess
requires_python>=3.10
licenseNone
keywords
VCS
bugtrack_url
requirements click click-option-group guildai jupyter jupyter_contrib_nbextensions lightgbm matplotlib mdbrtools numpy omegaconf openml pandas pymongo python-dotenv pytest ruff scikit_learn torch tqdm xgboost
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # ORiGAMi - Object Representation through Generative Autoregressive Modelling

## Overview

ORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.

Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.

ORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.

## Installation

ORiGAMi requires Python version 3.10 or 3.11. We recommend using a virtual environment, such as
Python's native [`venv`](https://docs.python.org/3/library/venv.html).

To install ORiGAMi with `pip`, use

```shell
pip install origami-ml
```

You can also clone the repository to your local machine and install the dependencies manually:

```shell
git clone https://github.com/mongodb-labs/origami.git
cd origami
pip install -r requirements.txt
pip install -e .
```

## Usage

ORiGAMi comes with a command line interface (CLI) and a Python SDK.

### Usage from the Command Line

The CLI allows to train a model and make predictions from a trained model. After installation, run `origami` from your shell to see an overview of available commands.

Help for specific commands is available with `origami <command> --help`, where `<command>` is currently one of `train` or `predict`. Note that the first time you run the `origami` CLI tool can take longer.

Detailed documentation for the CLI and available options can be found in [`CLI.md`](CLI.md).

### Usage with Python

To see an example on how to use ORiGAMi from Python, take a look at the provided [./notebooks](./notebooks/) folder, e.g. the [`example_origami_dungeons.ipynb`](./notebooks/example_origami_dungeons.ipynb) notebook.

## Experiment Reproduction

This code is released alongside our paper, which can be found on Arxiv: [ORIGAMI: A generative transformer architecture for predictions from semi-structured data](https://arxiv.org/abs/2412.17348). To reproduce the experiments in the paper, see the instructions in the [`./experiments/`](./experiments/) directory.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/mongodb-labs/origami",
    "name": "origami-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Thomas Rueckstiess",
    "author_email": "thomas.rueckstiess@mongodb.com",
    "download_url": "https://files.pythonhosted.org/packages/8e/87/817262c59f761a0537bc99b100397ea8358dfb6c05468daf554397ac6d14/origami_ml-0.1.4.tar.gz",
    "platform": null,
    "description": "# ORiGAMi - Object Representation through Generative Autoregressive Modelling\n\n## Overview\n\nORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.\n\nTypically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.\n\nORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.\n\n## Installation\n\nORiGAMi requires Python version 3.10 or 3.11. We recommend using a virtual environment, such as\nPython's native [`venv`](https://docs.python.org/3/library/venv.html).\n\nTo install ORiGAMi with `pip`, use\n\n```shell\npip install origami-ml\n```\n\nYou can also clone the repository to your local machine and install the dependencies manually:\n\n```shell\ngit clone https://github.com/mongodb-labs/origami.git\ncd origami\npip install -r requirements.txt\npip install -e .\n```\n\n## Usage\n\nORiGAMi comes with a command line interface (CLI) and a Python SDK.\n\n### Usage from the Command Line\n\nThe CLI allows to train a model and make predictions from a trained model. After installation, run `origami` from your shell to see an overview of available commands.\n\nHelp for specific commands is available with `origami <command> --help`, where `<command>` is currently one of `train` or `predict`. Note that the first time you run the `origami` CLI tool can take longer.\n\nDetailed documentation for the CLI and available options can be found in [`CLI.md`](CLI.md).\n\n### Usage with Python\n\nTo see an example on how to use ORiGAMi from Python, take a look at the provided [./notebooks](./notebooks/) folder, e.g. the [`example_origami_dungeons.ipynb`](./notebooks/example_origami_dungeons.ipynb) notebook.\n\n## Experiment Reproduction\n\nThis code is released alongside our paper, which can be found on Arxiv: [ORIGAMI: A generative transformer architecture for predictions from semi-structured data](https://arxiv.org/abs/2412.17348). To reproduce the experiments in the paper, see the instructions in the [`./experiments/`](./experiments/) directory.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An ML classifier model to make predictions from semi-structured data.",
    "version": "0.1.4",
    "project_urls": {
        "Homepage": "https://github.com/mongodb-labs/origami"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bd1f48d80349fe6cbdb6d6f2341433756b003ad579c314e24b428a2445f1203f",
                "md5": "1ce7684b28dfb7edaef1fd4fc9c8efb0",
                "sha256": "c95a5ac9b61119b155fe3508b936b56827f231c8f3eda4641d4d12ea62648b03"
            },
            "downloads": -1,
            "filename": "origami_ml-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1ce7684b28dfb7edaef1fd4fc9c8efb0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 53183,
            "upload_time": "2025-02-12T01:43:29",
            "upload_time_iso_8601": "2025-02-12T01:43:29.007887Z",
            "url": "https://files.pythonhosted.org/packages/bd/1f/48d80349fe6cbdb6d6f2341433756b003ad579c314e24b428a2445f1203f/origami_ml-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e87817262c59f761a0537bc99b100397ea8358dfb6c05468daf554397ac6d14",
                "md5": "4486eb3ff4aeff4b0d618278080b7456",
                "sha256": "53defba44b01c27bb6e5703a387e8261c86ef3fc75e846d5185fe95da1c6103d"
            },
            "downloads": -1,
            "filename": "origami_ml-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "4486eb3ff4aeff4b0d618278080b7456",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 45726,
            "upload_time": "2025-02-12T01:43:45",
            "upload_time_iso_8601": "2025-02-12T01:43:45.771990Z",
            "url": "https://files.pythonhosted.org/packages/8e/87/817262c59f761a0537bc99b100397ea8358dfb6c05468daf554397ac6d14/origami_ml-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-12 01:43:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mongodb-labs",
    "github_project": "origami",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "click-option-group",
            "specs": [
                [
                    "==",
                    "0.5.6"
                ]
            ]
        },
        {
            "name": "guildai",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "jupyter",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "jupyter_contrib_nbextensions",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "lightgbm",
            "specs": [
                [
                    "==",
                    "4.5.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.9.2"
                ]
            ]
        },
        {
            "name": "mdbrtools",
            "specs": [
                [
                    "==",
                    "0.1.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "omegaconf",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "openml",
            "specs": [
                [
                    "==",
                    "0.15.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "4.8.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.3.3"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    "==",
                    "0.9.3"
                ]
            ]
        },
        {
            "name": "scikit_learn",
            "specs": [
                [
                    "==",
                    "1.5.2"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.4.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    "==",
                    "2.1.3"
                ]
            ]
        }
    ],
    "lcname": "origami-ml"
}
        
Elapsed time: 0.71979s