Name | bodo JSON |
Version |
2024.12.2
JSON |
| download |
home_page | None |
Summary | High-Performance Python Compute Engine for Data and AI |
upload_time | 2024-12-19 19:56:07 |
maintainer | None |
docs_url | None |
author | Bodo.ai |
requires_python | <3.13,>=3.10 |
license | None |
keywords |
data
analytics
cluster
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
![Logo](Assets/bodo.png)
<h3 align="center">
<a href="https://docs.bodo.ai/latest/" target="_blank"><b>Docs</b></a>
·
<a href="https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email" target="_blank"><b>Slack</b></a>
·
<a href="https://www.bodo.ai/benchmarks/" target="_blank"><b>Benchmarks</b></a>
</h3>
# Bodo: High-Performance Python Compute Engine for Data and AI
Bodo is a cutting edge compute engine for large scale Python data processing. Powered by an innovative auto-parallelizing just-in-time compiler, Bodo transforms Python programs into highly optimized, parallel binaries without requiring code rewrites, which makes Bodo [20x to 240x faster](https://github.com/bodo-ai/Bodo/tree/main/benchmarks/nyc_taxi) compared to alternatives!
<img src="benchmarks/img/nyc-taxi-benchmark.png" alt="NYC Taxi Benchmark" width="500"/>
Unlike traditional distributed computing frameworks, Bodo:
- Seamlessly supports native Python APIs like Pandas and NumPy.
- Eliminates runtime overheads common in driver-executor models by leveraging Message Passing Interface (MPI) tech for true distributed execution.
## Goals
Bodo makes Python run much (much!) faster than it normally does!
1. **Exceptional Performance:**
Deliver HPC-grade performance and scalability for Python data workloads as if the code was written in C++/MPI, whether running on a laptop or across large cloud clusters.
2. **Easy to Use:**
Easily integrate into Python workflows with a simple decorator, and support native Pandas and NumPy APIs.
3. **Interoperable:**
Compatible with regular Python ecosystem, and can selectively speed up only the functions that are Bodo supported.
4. **Integration with Modern Data Infrastructure:**
Provide robust support for industry-leading data platforms like Apache Iceberg and Snowflake, enabling smooth interoperability with existing ecosystems.
## Non-goals
1. *Full Python Language Support:*
We are currently focused on a targeted subset of Python used for data-intensive and computationally heavy workloads, rather than supporting the entire Python syntax and all library APIs.
2. *Non-Data Workloads:*
Prioritize applications in data engineering, data science, and AI/ML. Bodo is not designed for general-purpose use cases that are non-data-centric.
3. *Real-time Compilation:*
While compilation time is improving, Bodo is not yet optimized for scenarios requiring very short compilation times (e.g., workloads with execution times of only a few seconds).
## Key Features
- Automatic optimization & parallelization of Python programs using Pandas and NumPy.
- Linear scalability from laptops to large-scale clusters and supercomputers.
- Advanced scalable I/O support for Iceberg, Snowflake, Parquet, CSV, and JSON with automatic filter pushdown and column pruning for optimized data access.
- High performance SQL Engine that is natively integrated into Python.
See Bodo documentation to learn more: https://docs.bodo.ai/
## Installation
Bodo can be installed using Pip or Conda:
```bash
pip install -U bodo
```
or
```bash
conda create -n Bodo python=3.12 -c conda-forge
conda activate Bodo
conda install bodo -c bodo.ai -c conda-forge
```
Bodo works with Linux x86 and both Mac x86 and Mac ARM right now. We will have Windows support (and more) coming soon!
## Example Code
Here is an example Pandas code that reads and processes a sample Parquet dataset with Bodo.
```python
import pandas as pd
import numpy as np
import bodo
import time
# Generate sample data
NUM_GROUPS = 30
NUM_ROWS = 20_000_000
df = pd.DataFrame({
"A": np.arange(NUM_ROWS) % NUM_GROUPS,
"B": np.arange(NUM_ROWS)
})
df.to_parquet("my_data.pq")
@bodo.jit(cache=True)
def computation():
t1 = time.time()
df = pd.read_parquet("my_data.pq")
df2 = pd.DataFrame({"A": df.apply(lambda r: 0 if r.A == 0 else (r.B // r.A), axis=1)})
df2.to_parquet("out.pq")
print("Execution time:", time.time() - t1)
computation()
```
## How to Contribute
Please read our latest [project contribution guide](CONTRIBUTING.md).
## Getting involved
You can join our community and collaborate with other contributors by joining our [Slack channel](https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email) – we’re excited to hear your ideas and help you get started!
Raw data
{
"_id": null,
"home_page": null,
"name": "bodo",
"maintainer": null,
"docs_url": null,
"requires_python": "<3.13,>=3.10",
"maintainer_email": null,
"keywords": "data, analytics, cluster",
"author": "Bodo.ai",
"author_email": null,
"download_url": null,
"platform": null,
"description": "![Logo](Assets/bodo.png)\n\n<h3 align=\"center\">\n <a href=\"https://docs.bodo.ai/latest/\" target=\"_blank\"><b>Docs</b></a>\n · \n <a href=\"https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email\" target=\"_blank\"><b>Slack</b></a>\n · \n <a href=\"https://www.bodo.ai/benchmarks/\" target=\"_blank\"><b>Benchmarks</b></a>\n</h3>\n\n# Bodo: High-Performance Python Compute Engine for Data and AI\n\nBodo is a cutting edge compute engine for large scale Python data processing. Powered by an innovative auto-parallelizing just-in-time compiler, Bodo transforms Python programs into highly optimized, parallel binaries without requiring code rewrites, which makes Bodo [20x to 240x faster](https://github.com/bodo-ai/Bodo/tree/main/benchmarks/nyc_taxi) compared to alternatives!\n\n<img src=\"benchmarks/img/nyc-taxi-benchmark.png\" alt=\"NYC Taxi Benchmark\" width=\"500\"/>\n\nUnlike traditional distributed computing frameworks, Bodo:\n- Seamlessly supports native Python APIs like Pandas and NumPy.\n- Eliminates runtime overheads common in driver-executor models by leveraging Message Passing Interface (MPI) tech for true distributed execution.\n\n## Goals\n\nBodo makes Python run much (much!) faster than it normally does!\n\n1. **Exceptional Performance:**\nDeliver HPC-grade performance and scalability for Python data workloads as if the code was written in C++/MPI, whether running on a laptop or across large cloud clusters.\n\n2. **Easy to Use:**\nEasily integrate into Python workflows with a simple decorator, and support native Pandas and NumPy APIs.\n\n3. **Interoperable:**\nCompatible with regular Python ecosystem, and can selectively speed up only the functions that are Bodo supported.\n\n4. **Integration with Modern Data Infrastructure:**\nProvide robust support for industry-leading data platforms like Apache Iceberg and Snowflake, enabling smooth interoperability with existing ecosystems.\n\n\n## Non-goals\n\n1. *Full Python Language Support:*\nWe are currently focused on a targeted subset of Python used for data-intensive and computationally heavy workloads, rather than supporting the entire Python syntax and all library APIs.\n\n2. *Non-Data Workloads:*\nPrioritize applications in data engineering, data science, and AI/ML. Bodo is not designed for general-purpose use cases that are non-data-centric.\n\n3. *Real-time Compilation:*\nWhile compilation time is improving, Bodo is not yet optimized for scenarios requiring very short compilation times (e.g., workloads with execution times of only a few seconds).\n\n\n## Key Features\n\n- Automatic optimization & parallelization of Python programs using Pandas and NumPy.\n- Linear scalability from laptops to large-scale clusters and supercomputers.\n- Advanced scalable I/O support for Iceberg, Snowflake, Parquet, CSV, and JSON with automatic filter pushdown and column pruning for optimized data access.\n- High performance SQL Engine that is natively integrated into Python.\n\nSee Bodo documentation to learn more: https://docs.bodo.ai/\n\n\n## Installation\n\nBodo can be installed using Pip or Conda:\n\n```bash\npip install -U bodo\n```\n\nor \n\n```bash\nconda create -n Bodo python=3.12 -c conda-forge\nconda activate Bodo\nconda install bodo -c bodo.ai -c conda-forge\n```\n\nBodo works with Linux x86 and both Mac x86 and Mac ARM right now. We will have Windows support (and more) coming soon!\n\n## Example Code\n\nHere is an example Pandas code that reads and processes a sample Parquet dataset with Bodo.\n\n\n```python\nimport pandas as pd\nimport numpy as np\nimport bodo\nimport time\n\n# Generate sample data\nNUM_GROUPS = 30\nNUM_ROWS = 20_000_000\n\ndf = pd.DataFrame({\n \"A\": np.arange(NUM_ROWS) % NUM_GROUPS,\n \"B\": np.arange(NUM_ROWS)\n})\ndf.to_parquet(\"my_data.pq\")\n\n@bodo.jit(cache=True)\ndef computation():\n t1 = time.time()\n df = pd.read_parquet(\"my_data.pq\")\n df2 = pd.DataFrame({\"A\": df.apply(lambda r: 0 if r.A == 0 else (r.B // r.A), axis=1)})\n df2.to_parquet(\"out.pq\")\n print(\"Execution time:\", time.time() - t1)\n\ncomputation()\n```\n\n## How to Contribute\n\nPlease read our latest [project contribution guide](CONTRIBUTING.md).\n\n## Getting involved\n\nYou can join our community and collaborate with other contributors by joining our [Slack channel](https://bodocommunity.slack.com/join/shared_invite/zt-qwdc8fad-6rZ8a1RmkkJ6eOX1X__knA#/shared-invite/email) \u2013 we\u2019re excited to hear your ideas and help you get started!\n",
"bugtrack_url": null,
"license": null,
"summary": "High-Performance Python Compute Engine for Data and AI",
"version": "2024.12.2",
"project_urls": {
"Documentation": "https://docs.bodo.ai",
"Homepage": "https://bodo.ai",
"Repository": "https://github.com/bodo-ai/Bodo"
},
"split_keywords": [
"data",
" analytics",
" cluster"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "756ebb88ad47e137642e30f082e1ceaefd38899abc1dde3a0d7492da58537dce",
"md5": "c39e3de2c5f606d0ccca1f6806b14eea",
"sha256": "4afd27f130b9e5512a0e92ab1da9916d981ed252ddb9ad416dbd2166614d8c7e"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp310-cp310-macosx_10_15_x86_64.whl",
"has_sig": false,
"md5_digest": "c39e3de2c5f606d0ccca1f6806b14eea",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": "<3.13,>=3.10",
"size": 45605984,
"upload_time": "2024-12-19T19:56:07",
"upload_time_iso_8601": "2024-12-19T19:56:07.042356Z",
"url": "https://files.pythonhosted.org/packages/75/6e/bb88ad47e137642e30f082e1ceaefd38899abc1dde3a0d7492da58537dce/bodo-2024.12.2-cp310-cp310-macosx_10_15_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3d2f3a630b9da212cc209535088a5cd6e80f6d55bba403ae6089e4bd0fb6019d",
"md5": "69c2bd84ff7aa4edd601d5bfdde715de",
"sha256": "9c2fbc2b1a90da8c37238fb512d3cbbad235d96a624a9f107af53c05c1844f1c"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp310-cp310-macosx_12_0_arm64.whl",
"has_sig": false,
"md5_digest": "69c2bd84ff7aa4edd601d5bfdde715de",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": "<3.13,>=3.10",
"size": 31892927,
"upload_time": "2024-12-19T19:56:13",
"upload_time_iso_8601": "2024-12-19T19:56:13.103326Z",
"url": "https://files.pythonhosted.org/packages/3d/2f/3a630b9da212cc209535088a5cd6e80f6d55bba403ae6089e4bd0fb6019d/bodo-2024.12.2-cp310-cp310-macosx_12_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b5abf0d1209f6ff39d5dcbedcaab422384617e37e966028f91a0a7807bfa9a7d",
"md5": "ce571d615ae79032925f145796ed561f",
"sha256": "a49ffd59e48036b70308453a00bc5f17586f399ab115700444f222a2917a38f8"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp310-cp310-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "ce571d615ae79032925f145796ed561f",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": "<3.13,>=3.10",
"size": 44647059,
"upload_time": "2024-12-19T19:56:24",
"upload_time_iso_8601": "2024-12-19T19:56:24.163271Z",
"url": "https://files.pythonhosted.org/packages/b5/ab/f0d1209f6ff39d5dcbedcaab422384617e37e966028f91a0a7807bfa9a7d/bodo-2024.12.2-cp310-cp310-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "530ec9163c3e485705de621635bd00b8ca15a7c66042b996cfc67dbf6397202b",
"md5": "4541d589c0cb76f63c0e778ff72aee7d",
"sha256": "7acfdbf1a785b68e23255172cdbd998fea2a194d1592016a3141a09f9748b677"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp311-cp311-macosx_10_15_x86_64.whl",
"has_sig": false,
"md5_digest": "4541d589c0cb76f63c0e778ff72aee7d",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": "<3.13,>=3.10",
"size": 45601362,
"upload_time": "2024-12-19T19:56:32",
"upload_time_iso_8601": "2024-12-19T19:56:32.534821Z",
"url": "https://files.pythonhosted.org/packages/53/0e/c9163c3e485705de621635bd00b8ca15a7c66042b996cfc67dbf6397202b/bodo-2024.12.2-cp311-cp311-macosx_10_15_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c7b3a03b6d9281c4f4b125f6040f79552a27b8a3f99fd7cdfa6915003103975a",
"md5": "1ca69dc7e1b1df28d582d20ccb34f9a2",
"sha256": "ef7522d80bc6a097e687ed4dd4e1094e0c47355a8736aadf26fc06ac8f867759"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp311-cp311-macosx_12_0_arm64.whl",
"has_sig": false,
"md5_digest": "1ca69dc7e1b1df28d582d20ccb34f9a2",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": "<3.13,>=3.10",
"size": 31887766,
"upload_time": "2024-12-19T19:56:38",
"upload_time_iso_8601": "2024-12-19T19:56:38.913056Z",
"url": "https://files.pythonhosted.org/packages/c7/b3/a03b6d9281c4f4b125f6040f79552a27b8a3f99fd7cdfa6915003103975a/bodo-2024.12.2-cp311-cp311-macosx_12_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6770654c252ce36bc1d0b98f001e0b0ae70ee5e51128eae6047fcb50e3ee8628",
"md5": "b16e8af9a014dd72a433e72453aa4239",
"sha256": "9af55bb6535cdc9299dced75c5239ff1015a2e92461e3c0c7b5dc4ce5ff9332b"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp311-cp311-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "b16e8af9a014dd72a433e72453aa4239",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": "<3.13,>=3.10",
"size": 44827275,
"upload_time": "2024-12-19T19:56:50",
"upload_time_iso_8601": "2024-12-19T19:56:50.412768Z",
"url": "https://files.pythonhosted.org/packages/67/70/654c252ce36bc1d0b98f001e0b0ae70ee5e51128eae6047fcb50e3ee8628/bodo-2024.12.2-cp311-cp311-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c676043185869f90c130421eef22a81e8368e07bde4b624f441c87440deee687",
"md5": "549ada8a487e8fdb77137b146c9a5578",
"sha256": "070329dd342538d7b5101f6ed7434128cf8e2262df8710f6b2d686b53b5d60b3"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp312-cp312-macosx_10_15_x86_64.whl",
"has_sig": false,
"md5_digest": "549ada8a487e8fdb77137b146c9a5578",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": "<3.13,>=3.10",
"size": 45603948,
"upload_time": "2024-12-19T19:57:00",
"upload_time_iso_8601": "2024-12-19T19:57:00.941482Z",
"url": "https://files.pythonhosted.org/packages/c6/76/043185869f90c130421eef22a81e8368e07bde4b624f441c87440deee687/bodo-2024.12.2-cp312-cp312-macosx_10_15_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "baf1b1257586354f02464119378512d665cb9d48d5c42fa3be1004eaa04a8b02",
"md5": "66503ea3e00b82453ae2a2bc20f8028e",
"sha256": "599efa2e7267e3743587b721a5c828251363dd17bc2fb7c20b267dd1585b00aa"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp312-cp312-macosx_12_0_arm64.whl",
"has_sig": false,
"md5_digest": "66503ea3e00b82453ae2a2bc20f8028e",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": "<3.13,>=3.10",
"size": 31887157,
"upload_time": "2024-12-19T19:57:08",
"upload_time_iso_8601": "2024-12-19T19:57:08.729897Z",
"url": "https://files.pythonhosted.org/packages/ba/f1/b1257586354f02464119378512d665cb9d48d5c42fa3be1004eaa04a8b02/bodo-2024.12.2-cp312-cp312-macosx_12_0_arm64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c65753708525fbc09530cb5b98d8caa64810ef652a065b1fd54b11fa049e7cb1",
"md5": "ed2d1689472cfcebe6d2f11fdced69db",
"sha256": "793536986918315b3a74beb5ac5cd16b4b90ca3ccd0754699260e807779c0a6a"
},
"downloads": -1,
"filename": "bodo-2024.12.2-cp312-cp312-manylinux_2_35_x86_64.whl",
"has_sig": false,
"md5_digest": "ed2d1689472cfcebe6d2f11fdced69db",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": "<3.13,>=3.10",
"size": 44695319,
"upload_time": "2024-12-19T19:57:15",
"upload_time_iso_8601": "2024-12-19T19:57:15.721201Z",
"url": "https://files.pythonhosted.org/packages/c6/57/53708525fbc09530cb5b98d8caa64810ef652a065b1fd54b11fa049e7cb1/bodo-2024.12.2-cp312-cp312-manylinux_2_35_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-19 19:56:07",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "bodo-ai",
"github_project": "Bodo",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "bodo"
}