origami-ml

Name	origami-ml JSON
Version	0.2.0 JSON
	download
home_page	https://github.com/rueckstiess/origami
Summary	An ML classifier model to make predictions from semi-structured data.
upload_time	2025-10-23 04:49:38
maintainer	None
docs_url	None
author	Thomas Rueckstiess
requires_python	>=3.10
license	None
keywords
VCS
bugtrack_url
requirements	click click-option-group guildai jupyter jupyter_contrib_nbextensions lightgbm matplotlib mdbrtools numpy omegaconf openml pandas pymongo python-dotenv pytest ruff scikit_learn torch tqdm xgboost
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
  <img src="assets/origami_logo.jpg" style="width: 100%; height: auto;">
</p>

# ORiGAMi - Object Representation through Generative Autoregressive Modelling

<p align="center">
| <a href="https://arxiv.org/abs/2412.17348"><b>ORiGAMi Paper on Arxiv</b></a> |
</p>

## Disclaimer

This is a personal fork of the original [mongodb-labs/origami](https://github.com/mongodb-labs/origami) project. While I was the original author, I have since left MongoDB and am continuing development and maintenance of this fork independently.

This tool is not officially supported or endorsed by MongoDB, Inc. The code is released for use "AS IS" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool against critical production systems.

## Overview

ORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.

Typically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.

ORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.

## Installation

ORiGAMi requires Python 3.11. We recommend using [`uv`](https://docs.astral.sh/uv/) for dependency management and virtual environments.

### Install from PyPI

```shell
pip install origami-ml
```

### Install from source with uv (recommended for development)

First, install `uv` if you haven't already:

```shell
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Then clone and install the project:

```shell
git clone https://github.com/rueckstiess/origami.git
cd origami
uv sync --extra dev
```

This will automatically create a virtual environment, install Python 3.11 if needed, and install all dependencies.

To run commands in the uv environment:

```shell
uv run origami --help
uv run pytest
```

## Usage

ORiGAMi comes with a command line interface (CLI) and a Python SDK.

### Usage from the Command Line

The CLI allows to train a model and make predictions from a trained model. After installation, run `origami` from your shell to see an overview of available commands.

Help for specific commands is available with `origami <command> --help`, where `<command>` is currently one of `train` or `predict`. Note that the first time you run the `origami` CLI tool can take longer.

Detailed documentation for the CLI and available options can be found in [`CLI.md`](CLI.md).

### Usage with Python

To see an example on how to use ORiGAMi from Python, take a look at the provided [./notebooks](./notebooks/) folder, e.g. the [`example_origami_dungeons.ipynb`](./notebooks/example_origami_dungeons.ipynb) notebook.

## Experiment Reproduction

This code is released alongside our paper, which can be found on Arxiv: [ORIGAMI: A generative transformer architecture for predictions from semi-structured data](https://arxiv.org/abs/2412.17348). To reproduce the experiments in the paper, see the instructions in the [`./experiments/`](./experiments/) directory.

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/rueckstiess/origami",
    "name": "origami-ml",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": null,
    "author": "Thomas Rueckstiess",
    "author_email": "Thomas Rueckstiess <me@tomr.au>",
    "download_url": "https://files.pythonhosted.org/packages/c4/09/df2b6dab0194a67872e9f4de5a8f5dfb25e0c9481f2ca5b6df4184d75eb5/origami_ml-0.2.0.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"assets/origami_logo.jpg\" style=\"width: 100%; height: auto;\">\n</p>\n\n# ORiGAMi - Object Representation through Generative Autoregressive Modelling\n\n<p align=\"center\">\n| <a href=\"https://arxiv.org/abs/2412.17348\"><b>ORiGAMi Paper on Arxiv</b></a> |\n</p>\n\n## Disclaimer\n\nThis is a personal fork of the original [mongodb-labs/origami](https://github.com/mongodb-labs/origami) project. While I was the original author, I have since left MongoDB and am continuing development and maintenance of this fork independently.\n\nThis tool is not officially supported or endorsed by MongoDB, Inc. The code is released for use \"AS IS\" without any warranties of any kind, including, but not limited to its installation, use, or performance. Do not run this tool against critical production systems.\n\n## Overview\n\nORiGAMi is a transformer-based Machine Learning model for supervised classification from semi-structured data such as MongoDB documents or JSON files.\n\nTypically, when working with semi-structured data in a Machine Learning context, the data needs to be flattened into a tabular format first. This flattening can be lossy, especially in the presence of arrays and nested objects, and often requires domain expertise to extract meaningful higher-order features from the raw data. This feature extraction step is manual, slow and expensive and doesn't scale well.\n\nORiGAMi circumvents this by directly operating on JSON data. Once a model is trained, it can be used to make predictions on any field in the dataset.\n\n## Installation\n\nORiGAMi requires Python 3.11. We recommend using [`uv`](https://docs.astral.sh/uv/) for dependency management and virtual environments.\n\n### Install from PyPI\n\n```shell\npip install origami-ml\n```\n\n### Install from source with uv (recommended for development)\n\nFirst, install `uv` if you haven't already:\n\n```shell\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\nThen clone and install the project:\n\n```shell\ngit clone https://github.com/rueckstiess/origami.git\ncd origami\nuv sync --extra dev\n```\n\nThis will automatically create a virtual environment, install Python 3.11 if needed, and install all dependencies.\n\nTo run commands in the uv environment:\n\n```shell\nuv run origami --help\nuv run pytest\n```\n\n## Usage\n\nORiGAMi comes with a command line interface (CLI) and a Python SDK.\n\n### Usage from the Command Line\n\nThe CLI allows to train a model and make predictions from a trained model. After installation, run `origami` from your shell to see an overview of available commands.\n\nHelp for specific commands is available with `origami <command> --help`, where `<command>` is currently one of `train` or `predict`. Note that the first time you run the `origami` CLI tool can take longer.\n\nDetailed documentation for the CLI and available options can be found in [`CLI.md`](CLI.md).\n\n### Usage with Python\n\nTo see an example on how to use ORiGAMi from Python, take a look at the provided [./notebooks](./notebooks/) folder, e.g. the [`example_origami_dungeons.ipynb`](./notebooks/example_origami_dungeons.ipynb) notebook.\n\n## Experiment Reproduction\n\nThis code is released alongside our paper, which can be found on Arxiv: [ORIGAMI: A generative transformer architecture for predictions from semi-structured data](https://arxiv.org/abs/2412.17348). To reproduce the experiments in the paper, see the instructions in the [`./experiments/`](./experiments/) directory.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "An ML classifier model to make predictions from semi-structured data.",
    "version": "0.2.0",
    "project_urls": {
        "Homepage": "https://github.com/rueckstiess/origami",
        "Issues": "https://github.com/rueckstiess/origami/issues",
        "Repository": "https://github.com/rueckstiess/origami"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "93632c3ab46401e83ac703b7c69264a8b10bbaca6529517184267d6ad03c3cff",
                "md5": "6654ec55eb2c89a09cc630b99f74a0f5",
                "sha256": "9c51e073680bc9ff14856b8135557803118c05433cfd00c23f05e059c5f13a7b"
            },
            "downloads": -1,
            "filename": "origami_ml-0.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6654ec55eb2c89a09cc630b99f74a0f5",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 63221,
            "upload_time": "2025-10-23T04:49:36",
            "upload_time_iso_8601": "2025-10-23T04:49:36.888171Z",
            "url": "https://files.pythonhosted.org/packages/93/63/2c3ab46401e83ac703b7c69264a8b10bbaca6529517184267d6ad03c3cff/origami_ml-0.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c409df2b6dab0194a67872e9f4de5a8f5dfb25e0c9481f2ca5b6df4184d75eb5",
                "md5": "3e15a413835eff591e2accebbe79eade",
                "sha256": "23f180e3f5789f5cd890d46140d8732a6af56f8a81711f374b7f8857d38cd48f"
            },
            "downloads": -1,
            "filename": "origami_ml-0.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "3e15a413835eff591e2accebbe79eade",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 53591,
            "upload_time": "2025-10-23T04:49:38",
            "upload_time_iso_8601": "2025-10-23T04:49:38.173355Z",
            "url": "https://files.pythonhosted.org/packages/c4/09/df2b6dab0194a67872e9f4de5a8f5dfb25e0c9481f2ca5b6df4184d75eb5/origami_ml-0.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-23 04:49:38",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "rueckstiess",
    "github_project": "origami",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "click",
            "specs": [
                [
                    "==",
                    "8.1.7"
                ]
            ]
        },
        {
            "name": "click-option-group",
            "specs": [
                [
                    "==",
                    "0.5.6"
                ]
            ]
        },
        {
            "name": "guildai",
            "specs": [
                [
                    "==",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "jupyter",
            "specs": [
                [
                    "==",
                    "1.1.1"
                ]
            ]
        },
        {
            "name": "jupyter_contrib_nbextensions",
            "specs": [
                [
                    "==",
                    "0.7.0"
                ]
            ]
        },
        {
            "name": "lightgbm",
            "specs": [
                [
                    "==",
                    "4.5.0"
                ]
            ]
        },
        {
            "name": "matplotlib",
            "specs": [
                [
                    "==",
                    "3.9.2"
                ]
            ]
        },
        {
            "name": "mdbrtools",
            "specs": [
                [
                    "==",
                    "0.1.1"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    "==",
                    "1.26.4"
                ]
            ]
        },
        {
            "name": "omegaconf",
            "specs": [
                [
                    "==",
                    "2.3.0"
                ]
            ]
        },
        {
            "name": "openml",
            "specs": [
                [
                    "==",
                    "0.15.1"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    "==",
                    "2.2.3"
                ]
            ]
        },
        {
            "name": "pymongo",
            "specs": [
                [
                    "==",
                    "4.8.0"
                ]
            ]
        },
        {
            "name": "python-dotenv",
            "specs": [
                [
                    "==",
                    "1.0.1"
                ]
            ]
        },
        {
            "name": "pytest",
            "specs": [
                [
                    "==",
                    "8.3.3"
                ]
            ]
        },
        {
            "name": "ruff",
            "specs": [
                [
                    "==",
                    "0.9.3"
                ]
            ]
        },
        {
            "name": "scikit_learn",
            "specs": [
                [
                    "==",
                    "1.5.2"
                ]
            ]
        },
        {
            "name": "torch",
            "specs": [
                [
                    "==",
                    "2.4.1"
                ]
            ]
        },
        {
            "name": "tqdm",
            "specs": [
                [
                    "==",
                    "4.66.4"
                ]
            ]
        },
        {
            "name": "xgboost",
            "specs": [
                [
                    "==",
                    "2.1.3"
                ]
            ]
        }
    ],
    "lcname": "origami-ml"
}

Thomas Rueckstiess