alea-preprocess


Namealea-preprocess JSON
Version 0.1.12 PyPI version JSON
download
home_pagehttps://aleainstitute.ai/
SummaryEfficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.
upload_time2024-10-08 12:58:53
maintainerNone
docs_urlNone
authorALEA Institute <hello@aleainstitute.ai>
requires_python<4.0,>=3.10
licenseMIT
keywords alea llm data preprocess pretrain kl3m
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # alea-preprocess

[![PyPI version](https://badge.fury.io/py/alea-preprocess.svg)](https://badge.fury.io/py/alea-preprocess)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Versions](https://img.shields.io/pypi/pyversions/alea-preprocess.svg)](https://pypi.org/project/alea-preprocess/)

## Description
Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.

This library is part of ALEA's open source large language model training pipeline, used in the research and development
of the [KL3M](https://kl3m.ai/) project.


## Installation

Note that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install
the package from GitHub source until a stable release is available.

You can install the latest release from PyPI using pip:
```
pip install alea-preprocess
```

You can install a development version of the package by running the following command:
```
poetry run maturin develop
```


## Examples
Example use cases are currently available under the `tests/` directory.

Additional documentation and examples will be provided in the future.

## License

This ALEA project is released under the MIT License. See the [LICENSE](LICENSE) file for details.

## Support

If you encounter any issues or have questions about using this ALEA project, please [open an issue](https://github.com/alea-institute/alea-preprocess/issues) on GitHub.

## Learn More

To learn more about ALEA and its software and research projects like KL3M, visit the [ALEA website](https://aleainstitute.ai/).


            

Raw data

            {
    "_id": null,
    "home_page": "https://aleainstitute.ai/",
    "name": "alea-preprocess",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.10",
    "maintainer_email": null,
    "keywords": "alea, llm, data, preprocess, pretrain, kl3m",
    "author": "ALEA Institute <hello@aleainstitute.ai>",
    "author_email": "ALEA Institute <hello@aleainstitute.ai>",
    "download_url": "https://files.pythonhosted.org/packages/3d/65/3ca921037b1f7d14cef1aa7dd04d3ef5ead63e53b2ecb434e12cfcef6cd7/alea_preprocess-0.1.12.tar.gz",
    "platform": null,
    "description": "# alea-preprocess\n\n[![PyPI version](https://badge.fury.io/py/alea-preprocess.svg)](https://badge.fury.io/py/alea-preprocess)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Python Versions](https://img.shields.io/pypi/pyversions/alea-preprocess.svg)](https://pypi.org/project/alea-preprocess/)\n\n## Description\nEfficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation.\n\nThis library is part of ALEA's open source large language model training pipeline, used in the research and development\nof the [KL3M](https://kl3m.ai/) project.\n\n\n## Installation\n\nNote that this project is a work-in-progress and relies on compiled Rust code. As such, it is recommended to install\nthe package from GitHub source until a stable release is available.\n\nYou can install the latest release from PyPI using pip:\n```\npip install alea-preprocess\n```\n\nYou can install a development version of the package by running the following command:\n```\npoetry run maturin develop\n```\n\n\n## Examples\nExample use cases are currently available under the `tests/` directory.\n\nAdditional documentation and examples will be provided in the future.\n\n## License\n\nThis ALEA project is released under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Support\n\nIf you encounter any issues or have questions about using this ALEA project, please [open an issue](https://github.com/alea-institute/alea-preprocess/issues) on GitHub.\n\n## Learn More\n\nTo learn more about ALEA and its software and research projects like KL3M, visit the [ALEA website](https://aleainstitute.ai/).\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Efficient, accessible preprocessing routines for pretrain, SFT, and DPO training data preparation from the ALEA Institute.",
    "version": "0.1.12",
    "project_urls": {
        "Homepage": "https://aleainstitute.ai/",
        "Source Code": "https://github.com/alea-institute/alea-preprocess"
    },
    "split_keywords": [
        "alea",
        " llm",
        " data",
        " preprocess",
        " pretrain",
        " kl3m"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "045b0e9a0d630fbbe6cddac68a76dac8638467fc1f78b4758370f8b580c4231f",
                "md5": "fc2c076a09b75b0aef31ac3ca8869509",
                "sha256": "cd12af2ad853b5e91b83676157fda337cb35c9db95bdf871cae1a50b45620519"
            },
            "downloads": -1,
            "filename": "alea_preprocess-0.1.12-cp312-cp312-manylinux_2_34_x86_64.whl",
            "has_sig": false,
            "md5_digest": "fc2c076a09b75b0aef31ac3ca8869509",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": "<4.0,>=3.10",
            "size": 9180378,
            "upload_time": "2024-10-08T12:58:51",
            "upload_time_iso_8601": "2024-10-08T12:58:51.552623Z",
            "url": "https://files.pythonhosted.org/packages/04/5b/0e9a0d630fbbe6cddac68a76dac8638467fc1f78b4758370f8b580c4231f/alea_preprocess-0.1.12-cp312-cp312-manylinux_2_34_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "3d653ca921037b1f7d14cef1aa7dd04d3ef5ead63e53b2ecb434e12cfcef6cd7",
                "md5": "c1770ab549946a003018983400000ee7",
                "sha256": "be4c6f0b8dc81bcf27eadee4bdad825cd27f8a8c40e73789fec6b82ef921124a"
            },
            "downloads": -1,
            "filename": "alea_preprocess-0.1.12.tar.gz",
            "has_sig": false,
            "md5_digest": "c1770ab549946a003018983400000ee7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.10",
            "size": 82804,
            "upload_time": "2024-10-08T12:58:53",
            "upload_time_iso_8601": "2024-10-08T12:58:53.774828Z",
            "url": "https://files.pythonhosted.org/packages/3d/65/3ca921037b1f7d14cef1aa7dd04d3ef5ead63e53b2ecb434e12cfcef6cd7/alea_preprocess-0.1.12.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-08 12:58:53",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "alea-institute",
    "github_project": "alea-preprocess",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "alea-preprocess"
}
        
Elapsed time: 1.87282s