bodo


Namebodo JSON
Version 2025.1 PyPI version JSON
download
home_pageNone
SummaryHigh-Performance Python Compute Engine for Data and AI
upload_time2025-01-13 22:40:18
maintainerNone
docs_urlNone
authorBodo.ai
requires_python<3.13,>=3.10
licenseNone
keywords data analytics cluster
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <!--
NOTE: the example in this file is covered by tests in bodo/tests/test_quickstart_docs.py. Any changes to the examples in this file should also update the corresponding unit test.
 -->

![Logo](Assets/bodo.png)

<h3 align="center">
  <a href="https://docs.bodo.ai/latest/" target="_blank"><b>Docs</b></a>
  &nbsp;&#183;&nbsp;
  <a href="https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email" target="_blank"><b>Slack</b></a>
  &nbsp;&#183;&nbsp;
  <a href="https://www.bodo.ai/benchmarks/" target="_blank"><b>Benchmarks</b></a>
</h3>

# Bodo: High-Performance Python Compute Engine for Data and AI

Bodo is a cutting edge compute engine for large scale Python data processing. Powered by an innovative auto-parallelizing just-in-time compiler, Bodo transforms Python programs into highly optimized, parallel binaries without requiring code rewrites, which makes Bodo [20x to 240x faster](https://github.com/bodo-ai/Bodo/tree/main/benchmarks/nyc_taxi) compared to alternatives!

<img src="benchmarks/img/nyc-taxi-benchmark.png" alt="NYC Taxi Benchmark" width="500"/>

Unlike traditional distributed computing frameworks, Bodo:
- Seamlessly supports native Python APIs like Pandas and NumPy.
- Eliminates runtime overheads common in driver-executor models by leveraging Message Passing Interface (MPI) tech for true distributed execution.

## Goals

Bodo makes Python run much (much!) faster than it normally does!

1. **Exceptional Performance:**
Deliver HPC-grade performance and scalability for Python data workloads as if the code was written in C++/MPI, whether running on a laptop or across large cloud clusters.

2. **Easy to Use:**
Easily integrate into Python workflows with a simple decorator, and support native Pandas and NumPy APIs.

3. **Interoperable:**
Compatible with regular Python ecosystem, and can selectively speed up only the functions that are Bodo supported.

4. **Integration with Modern Data Infrastructure:**
Provide robust support for industry-leading data platforms like Apache Iceberg and Snowflake, enabling smooth interoperability with existing ecosystems.


## Non-goals

1. *Full Python Language Support:*
We are currently focused on a targeted subset of Python used for data-intensive and computationally heavy workloads, rather than supporting the entire Python syntax and all library APIs.

2. *Non-Data Workloads:*
Prioritize applications in data engineering, data science, and AI/ML. Bodo is not designed for general-purpose use cases that are non-data-centric.

3. *Real-time Compilation:*
While compilation time is improving, Bodo is not yet optimized for scenarios requiring very short compilation times (e.g., workloads with execution times of only a few seconds).


## Key Features

- Automatic optimization & parallelization of Python programs using Pandas and NumPy.
- Linear scalability from laptops to large-scale clusters and supercomputers.
- Advanced scalable I/O support for Iceberg, Snowflake, Parquet, CSV, and JSON with automatic filter pushdown and column pruning for optimized data access.
- High performance SQL Engine that is natively integrated into Python.

See Bodo documentation to learn more: https://docs.bodo.ai/


## Installation

Note: Bodo requires Python 3.10, 3.11, or 3.12.

Bodo can be installed using Pip or Conda:

```bash
pip install -U bodo
```

or

```bash
conda create -n Bodo python=3.12 -c conda-forge
conda activate Bodo
conda install bodo -c bodo.ai -c conda-forge
```

Bodo works with Linux x86 and both Mac x86 and Mac ARM right now. We will have Windows support (and more) coming soon!

## Example Code

Here is an example Pandas code that reads and processes a sample Parquet dataset with Bodo.


```python
import pandas as pd
import numpy as np
import bodo
import time

# Generate sample data
NUM_GROUPS = 30
NUM_ROWS = 20_000_000

df = pd.DataFrame({
    "A": np.arange(NUM_ROWS) % NUM_GROUPS,
    "B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")

@bodo.jit(cache=True)
def computation():
    t1 = time.time()
    df = pd.read_parquet("my_data.pq")
    df2 = pd.DataFrame({"A": df.apply(lambda r: 0 if r.A == 0 else (r.B // r.A), axis=1)})
    df2.to_parquet("out.pq")
    print("Execution time:", time.time() - t1)

computation()
```

## How to Contribute

Please read our latest [project contribution guide](CONTRIBUTING.md).

## Getting involved

You can join our community and collaborate with other contributors by joining our [Slack channel](https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email) – we’re excited to hear your ideas and help you get started!

[![codecov](https://codecov.io/github/bodo-ai/Bodo/graph/badge.svg?token=zYHQy0R9ck)](https://codecov.io/github/bodo-ai/Bodo)
            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "bodo",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<3.13,>=3.10",
    "maintainer_email": null,
    "keywords": "data, analytics, cluster",
    "author": "Bodo.ai",
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "<!--\nNOTE: the example in this file is covered by tests in bodo/tests/test_quickstart_docs.py. Any changes to the examples in this file should also update the corresponding unit test.\n -->\n\n![Logo](Assets/bodo.png)\n\n<h3 align=\"center\">\n  <a href=\"https://docs.bodo.ai/latest/\" target=\"_blank\"><b>Docs</b></a>\n  &nbsp;&#183;&nbsp;\n  <a href=\"https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email\" target=\"_blank\"><b>Slack</b></a>\n  &nbsp;&#183;&nbsp;\n  <a href=\"https://www.bodo.ai/benchmarks/\" target=\"_blank\"><b>Benchmarks</b></a>\n</h3>\n\n# Bodo: High-Performance Python Compute Engine for Data and AI\n\nBodo is a cutting edge compute engine for large scale Python data processing. Powered by an innovative auto-parallelizing just-in-time compiler, Bodo transforms Python programs into highly optimized, parallel binaries without requiring code rewrites, which makes Bodo [20x to 240x faster](https://github.com/bodo-ai/Bodo/tree/main/benchmarks/nyc_taxi) compared to alternatives!\n\n<img src=\"benchmarks/img/nyc-taxi-benchmark.png\" alt=\"NYC Taxi Benchmark\" width=\"500\"/>\n\nUnlike traditional distributed computing frameworks, Bodo:\n- Seamlessly supports native Python APIs like Pandas and NumPy.\n- Eliminates runtime overheads common in driver-executor models by leveraging Message Passing Interface (MPI) tech for true distributed execution.\n\n## Goals\n\nBodo makes Python run much (much!) faster than it normally does!\n\n1. **Exceptional Performance:**\nDeliver HPC-grade performance and scalability for Python data workloads as if the code was written in C++/MPI, whether running on a laptop or across large cloud clusters.\n\n2. **Easy to Use:**\nEasily integrate into Python workflows with a simple decorator, and support native Pandas and NumPy APIs.\n\n3. **Interoperable:**\nCompatible with regular Python ecosystem, and can selectively speed up only the functions that are Bodo supported.\n\n4. **Integration with Modern Data Infrastructure:**\nProvide robust support for industry-leading data platforms like Apache Iceberg and Snowflake, enabling smooth interoperability with existing ecosystems.\n\n\n## Non-goals\n\n1. *Full Python Language Support:*\nWe are currently focused on a targeted subset of Python used for data-intensive and computationally heavy workloads, rather than supporting the entire Python syntax and all library APIs.\n\n2. *Non-Data Workloads:*\nPrioritize applications in data engineering, data science, and AI/ML. Bodo is not designed for general-purpose use cases that are non-data-centric.\n\n3. *Real-time Compilation:*\nWhile compilation time is improving, Bodo is not yet optimized for scenarios requiring very short compilation times (e.g., workloads with execution times of only a few seconds).\n\n\n## Key Features\n\n- Automatic optimization & parallelization of Python programs using Pandas and NumPy.\n- Linear scalability from laptops to large-scale clusters and supercomputers.\n- Advanced scalable I/O support for Iceberg, Snowflake, Parquet, CSV, and JSON with automatic filter pushdown and column pruning for optimized data access.\n- High performance SQL Engine that is natively integrated into Python.\n\nSee Bodo documentation to learn more: https://docs.bodo.ai/\n\n\n## Installation\n\nNote: Bodo requires Python 3.10, 3.11, or 3.12.\n\nBodo can be installed using Pip or Conda:\n\n```bash\npip install -U bodo\n```\n\nor\n\n```bash\nconda create -n Bodo python=3.12 -c conda-forge\nconda activate Bodo\nconda install bodo -c bodo.ai -c conda-forge\n```\n\nBodo works with Linux x86 and both Mac x86 and Mac ARM right now. We will have Windows support (and more) coming soon!\n\n## Example Code\n\nHere is an example Pandas code that reads and processes a sample Parquet dataset with Bodo.\n\n\n```python\nimport pandas as pd\nimport numpy as np\nimport bodo\nimport time\n\n# Generate sample data\nNUM_GROUPS = 30\nNUM_ROWS = 20_000_000\n\ndf = pd.DataFrame({\n    \"A\": np.arange(NUM_ROWS) % NUM_GROUPS,\n    \"B\": np.arange(NUM_ROWS)\n})\ndf.to_parquet(\"my_data.pq\")\n\n@bodo.jit(cache=True)\ndef computation():\n    t1 = time.time()\n    df = pd.read_parquet(\"my_data.pq\")\n    df2 = pd.DataFrame({\"A\": df.apply(lambda r: 0 if r.A == 0 else (r.B // r.A), axis=1)})\n    df2.to_parquet(\"out.pq\")\n    print(\"Execution time:\", time.time() - t1)\n\ncomputation()\n```\n\n## How to Contribute\n\nPlease read our latest [project contribution guide](CONTRIBUTING.md).\n\n## Getting involved\n\nYou can join our community and collaborate with other contributors by joining our [Slack channel](https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email) \u2013 we\u2019re excited to hear your ideas and help you get started!\n\n[![codecov](https://codecov.io/github/bodo-ai/Bodo/graph/badge.svg?token=zYHQy0R9ck)](https://codecov.io/github/bodo-ai/Bodo)",
    "bugtrack_url": null,
    "license": null,
    "summary": "High-Performance Python Compute Engine for Data and AI",
    "version": "2025.1",
    "project_urls": {
        "Documentation": "https://docs.bodo.ai",
        "Homepage": "https://bodo.ai",
        "Repository": "https://github.com/bodo-ai/Bodo"
    },
    "split_keywords": [
        "data",
        " analytics",
        " cluster"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8e113ad634bd328e31235c8fb6ca13724c60cfe234c54511d09eb3fd1633aa7a",
                "md5": "a5f33551a508380ec406427c1769700c",
                "sha256": "ab77bb5af5dd1239d5878ffd3f58dd0e882d0ba96225797c99fc4585d4635b64"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp310-cp310-macosx_10_15_x86_64.whl",
            "has_sig": false,
            "md5_digest": "a5f33551a508380ec406427c1769700c",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": "<3.13,>=3.10",
            "size": 45643867,
            "upload_time": "2025-01-13T22:40:18",
            "upload_time_iso_8601": "2025-01-13T22:40:18.580839Z",
            "url": "https://files.pythonhosted.org/packages/8e/11/3ad634bd328e31235c8fb6ca13724c60cfe234c54511d09eb3fd1633aa7a/bodo-2025.1-cp310-cp310-macosx_10_15_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d993f24ec878c8bd4376b6f48a4839050d876cef3f77ba0861b50b0b907e81f7",
                "md5": "2f90f9fc44e83b5d4e767e7ef60ce881",
                "sha256": "e32e219793426fbb1a91d8ec599979772a5d744a831759b579c5ce7e9879ddb6"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp310-cp310-macosx_12_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "2f90f9fc44e83b5d4e767e7ef60ce881",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": "<3.13,>=3.10",
            "size": 31919420,
            "upload_time": "2025-01-13T22:39:48",
            "upload_time_iso_8601": "2025-01-13T22:39:48.966482Z",
            "url": "https://files.pythonhosted.org/packages/d9/93/f24ec878c8bd4376b6f48a4839050d876cef3f77ba0861b50b0b907e81f7/bodo-2025.1-cp310-cp310-macosx_12_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a36698d40c66eb8a1078402995651694e23f0accb51da98b7ac289dd06bbfbc3",
                "md5": "6409f7af7b1f18a0bffd251ab00cd3b0",
                "sha256": "21a30bc2684b08fece97bd10671ee7ca220abd550201496208ff0fa2acc0d314"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp310-cp310-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "6409f7af7b1f18a0bffd251ab00cd3b0",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": "<3.13,>=3.10",
            "size": 47194084,
            "upload_time": "2025-01-13T22:38:54",
            "upload_time_iso_8601": "2025-01-13T22:38:54.291246Z",
            "url": "https://files.pythonhosted.org/packages/a3/66/98d40c66eb8a1078402995651694e23f0accb51da98b7ac289dd06bbfbc3/bodo-2025.1-cp310-cp310-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e78c76d6e17966514e3f0dbb37c5765bf762a3e3f1712fc250df484f9471a308",
                "md5": "b068d9c898bc92fdb509baeaca1f42d4",
                "sha256": "1d1c30a79c969bcde9eaabe146e5de7e06a80ade443442fbaf88317d4d75c947"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp311-cp311-macosx_10_15_x86_64.whl",
            "has_sig": false,
            "md5_digest": "b068d9c898bc92fdb509baeaca1f42d4",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": "<3.13,>=3.10",
            "size": 45635430,
            "upload_time": "2025-01-13T22:40:26",
            "upload_time_iso_8601": "2025-01-13T22:40:26.732568Z",
            "url": "https://files.pythonhosted.org/packages/e7/8c/76d6e17966514e3f0dbb37c5765bf762a3e3f1712fc250df484f9471a308/bodo-2025.1-cp311-cp311-macosx_10_15_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c58c570da5458c4d5c2b2a1f77805143f524c937dd030f969cc3fade41c10160",
                "md5": "ed3f5bb40ef7b7b339d3ed1cd73a759c",
                "sha256": "1a16b9cabba62ecef97febe230854e3efdcfe1449a5a96979d882c2e7c441c44"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp311-cp311-macosx_12_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "ed3f5bb40ef7b7b339d3ed1cd73a759c",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": "<3.13,>=3.10",
            "size": 31915638,
            "upload_time": "2025-01-13T22:39:55",
            "upload_time_iso_8601": "2025-01-13T22:39:55.450295Z",
            "url": "https://files.pythonhosted.org/packages/c5/8c/570da5458c4d5c2b2a1f77805143f524c937dd030f969cc3fade41c10160/bodo-2025.1-cp311-cp311-macosx_12_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "896006566da23235d0f4efb768ebbf8a925bdaadb4bc830b5fcc87a3dbc08a4a",
                "md5": "bb47e933142324f9de52689dc7272ca6",
                "sha256": "0b5554b5d363d8aefde1195d313424ac82bc007fad4dd3c24ef52cfb931577a5"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp311-cp311-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "bb47e933142324f9de52689dc7272ca6",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": "<3.13,>=3.10",
            "size": 47376700,
            "upload_time": "2025-01-13T22:39:02",
            "upload_time_iso_8601": "2025-01-13T22:39:02.842202Z",
            "url": "https://files.pythonhosted.org/packages/89/60/06566da23235d0f4efb768ebbf8a925bdaadb4bc830b5fcc87a3dbc08a4a/bodo-2025.1-cp311-cp311-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b5a551264c07b6fe72437885001669f574ab123bed597c0cca4f1bfa9a8e9040",
                "md5": "dd275774339580d92927e97bd4b522d3",
                "sha256": "40daaf93cb924c0fb98e888ff2d646d71d3c18cf8798474855281292f408d047"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp312-cp312-macosx_10_15_x86_64.whl",
            "has_sig": false,
            "md5_digest": "dd275774339580d92927e97bd4b522d3",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": "<3.13,>=3.10",
            "size": 45666479,
            "upload_time": "2025-01-13T22:40:36",
            "upload_time_iso_8601": "2025-01-13T22:40:36.014214Z",
            "url": "https://files.pythonhosted.org/packages/b5/a5/51264c07b6fe72437885001669f574ab123bed597c0cca4f1bfa9a8e9040/bodo-2025.1-cp312-cp312-macosx_10_15_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3472901691321f61d5ee9a62dc83aef7ce4b178574830d3fafedaf7626303a7a",
                "md5": "c1234f9da06f5fe8db63443babe938d9",
                "sha256": "68fc9c676f2f40203bd79105c63caf4bb6791b361bd1a96bbca5d8d6725bdb39"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp312-cp312-macosx_12_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "c1234f9da06f5fe8db63443babe938d9",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": "<3.13,>=3.10",
            "size": 31940603,
            "upload_time": "2025-01-13T22:40:00",
            "upload_time_iso_8601": "2025-01-13T22:40:00.844791Z",
            "url": "https://files.pythonhosted.org/packages/34/72/901691321f61d5ee9a62dc83aef7ce4b178574830d3fafedaf7626303a7a/bodo-2025.1-cp312-cp312-macosx_12_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8c39111ef51cc308d157ee07a926d8cd2cc6ca4dfa7d1c1f15f6fe2802567fdc",
                "md5": "c700d6d37c066564c339bb62f2f168bb",
                "sha256": "f38b671299db17e2965381e885e631803f4ec7b3570ddd98f94cdb78f1570a71"
            },
            "downloads": -1,
            "filename": "bodo-2025.1-cp312-cp312-manylinux_2_28_x86_64.whl",
            "has_sig": false,
            "md5_digest": "c700d6d37c066564c339bb62f2f168bb",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": "<3.13,>=3.10",
            "size": 47338368,
            "upload_time": "2025-01-13T22:39:10",
            "upload_time_iso_8601": "2025-01-13T22:39:10.255610Z",
            "url": "https://files.pythonhosted.org/packages/8c/39/111ef51cc308d157ee07a926d8cd2cc6ca4dfa7d1c1f15f6fe2802567fdc/bodo-2025.1-cp312-cp312-manylinux_2_28_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-13 22:40:18",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "bodo-ai",
    "github_project": "Bodo",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "bodo"
}
        
Elapsed time: 0.45306s