palimpzest

Name	palimpzest JSON
Version	0.7.20 JSON
	download
home_page	None
Summary	Palimpzest is a system which enables anyone to process AI-powered analytical queries simply by defining them in a declarative language
upload_time	2025-07-23 19:20:06
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	relational optimization llm ai programming extraction tools document search integration
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            ![pz-banner](https://palimpzest-workloads.s3.us-east-1.amazonaws.com/palimpzest-cropped.png)

# Palimpzest (PZ)
[![Discord](https://img.shields.io/discord/1245561987480420445?logo=discord)](https://discord.gg/dN85JJ6jaH)
[![Docs](https://img.shields.io/badge/Read_the_Docs-purple?logo=readthedocs)](https://palimpzest.org/)
[![Colab Demo](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Fm8I4yL1az395MsFkQbEIZSmUZs0oGvZ?usp=sharing)
[![PyPI](https://img.shields.io/pypi/v/palimpzest)](https://pypi.org/project/palimpzest/)
[![PyPI - Monthly Downloads](https://img.shields.io/pypi/dm/palimpzest?color=teal)](https://pypi.org/project/palimpzest/)
<!-- [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2405.14696) -->
<!-- [![Video](https://img.shields.io/badge/YouTube-Talk-red?logo=youtube)](https://youtu.be/T8VQfyBiki0?si=eiph57DSEkDNbEIu) -->

## Learn How to Use PZ
Our [full documentation](https://palimpzest.org) is the definitive resource for learning how to use PZ. It contains all of the installation and quickstart materials on this page, as well as user guides, full API documentation, and much more.

## Getting started
You can find a stable version of the PZ package on PyPI [here](https://pypi.org/project/palimpzest/). To install the package, run:
```bash
$ pip install palimpzest
```

Alternatively, to install the latest version of the package from this repository, you can clone this repository and run the following commands:
```bash
$ git clone git@github.com:mitdbg/palimpzest.git
$ cd palimpzest
$ pip install .
```

## Join the PZ Community
We are actively hacking on PZ and would love to have you join our community [![Discord](https://img.shields.io/discord/1245561987480420445?logo=discord)](https://discord.gg/dN85JJ6jaH)

[Our Discord server](https://discord.gg/dN85JJ6jaH) is the best place to:
- Get help with your PZ program(s)
- Give feedback to the maintainers
- Discuss the future direction(s) of the project
- Discuss anything related to data processing with LLMs!

We are eager to learn more about your workloads and use cases, and will take them into consideration in planning our future roadmap.

## Quick Start
The easiest way to get started with Palimpzest is to run the `quickstart.ipynb` jupyter notebook. We demonstrate the full workflow of working with PZ, including registering a dataset, composing and executing a pipeline, and accessing the results.
To run the notebook, you can use the following command:
```bash
$ jupyter notebook
```
And then access the notebook from the jupyter interface in your browser at `localhost:8888`.

### Even Quicker Start
For eager readers, the code in the notebook can be found in the following condensed snippet. However, we do suggest reading the notebook as it contains more insight into each element of the program.
```python
import palimpzest as pz

# define the fields we wish to compute
email_cols = [
    {"name": "sender", "type": str, "desc": "The email address of the sender"},
    {"name": "subject", "type": str, "desc": "The subject of the email"},
    {"name": "date", "type": str, "desc": "The date the email was sent"},
]

# lazily construct the computation to get emails about holidays sent in July
dataset = pz.Dataset("testdata/enron-tiny/")
dataset = dataset.sem_add_columns(email_cols)
dataset = dataset.sem_filter("The email was sent in July")
dataset = dataset.sem_filter("The email is about holidays")

# execute the computation w/the MinCost policy
config = pz.QueryProcessorConfig(policy=pz.MinCost(), verbose=True)
output = dataset.run(config)

# display output (if using Jupyter, otherwise use print(output_df))
output_df = output.to_df(cols=["date", "sender", "subject"])
display(output_df)
```

## Python Demos
Below are simple instructions to run PZ on a test data set of enron emails that is included with the system.

### Downloading test data
To run the provided demos, you will need to download the test data. Due to the size of the data, we are unable to include it in the repository. You can download the test data by running the following command from a unix terminal (requires `wget` and `tar`):
```
chmod +x testdata/download-testdata.sh
./testdata/download-testdata.sh
```

### Running the Demos
Set your OpenAI (or Together.ai) api key at the command line:
```bash
# set one (or both) of the following:
export OPENAI_API_KEY=<your-api-key>
export TOGETHER_API_KEY=<your-api-key>
```

Now you can run the simple test program with:
```bash
$ python demos/simple-demo.py --task enron --dataset testdata/enron-eval-tiny --verbose
```

### Citation
If you would like to cite our work, please use the following citation:
```
@inproceedings{palimpzestCIDR,
    title={Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing},
    author={Liu, Chunwei and Russo, Matthew and Cafarella, Michael and Cao, Lei and Chen, Peter Baile and Chen, Zui and Franklin, Michael and Kraska, Tim and Madden, Samuel and Shahout, Rana and Vitagliano, Gerardo},
    booktitle = {Proceedings of the {{Conference}} on {{Innovative Database Research}} ({{CIDR}})},
    date = 2025,
}
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "palimpzest",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "relational, optimization, llm, AI programming, extraction, tools, document, search, integration",
    "author": null,
    "author_email": "MIT DSG Semantic Management Lab <michjc@csail.mit.edu>",
    "download_url": "https://files.pythonhosted.org/packages/c4/5d/cb4214acb28e8bf41fd4faade4995e933d26344a5db4bbff11dc4945d6f6/palimpzest-0.7.20.tar.gz",
    "platform": null,
    "description": "![pz-banner](https://palimpzest-workloads.s3.us-east-1.amazonaws.com/palimpzest-cropped.png)\n\n# Palimpzest (PZ)\n[![Discord](https://img.shields.io/discord/1245561987480420445?logo=discord)](https://discord.gg/dN85JJ6jaH)\n[![Docs](https://img.shields.io/badge/Read_the_Docs-purple?logo=readthedocs)](https://palimpzest.org/)\n[![Colab Demo](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Fm8I4yL1az395MsFkQbEIZSmUZs0oGvZ?usp=sharing)\n[![PyPI](https://img.shields.io/pypi/v/palimpzest)](https://pypi.org/project/palimpzest/)\n[![PyPI - Monthly Downloads](https://img.shields.io/pypi/dm/palimpzest?color=teal)](https://pypi.org/project/palimpzest/)\n<!-- [![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv)](https://arxiv.org/pdf/2405.14696) -->\n<!-- [![Video](https://img.shields.io/badge/YouTube-Talk-red?logo=youtube)](https://youtu.be/T8VQfyBiki0?si=eiph57DSEkDNbEIu) -->\n\n## Learn How to Use PZ\nOur [full documentation](https://palimpzest.org) is the definitive resource for learning how to use PZ. It contains all of the installation and quickstart materials on this page, as well as user guides, full API documentation, and much more.\n\n## Getting started\nYou can find a stable version of the PZ package on PyPI [here](https://pypi.org/project/palimpzest/). To install the package, run:\n```bash\n$ pip install palimpzest\n```\n\nAlternatively, to install the latest version of the package from this repository, you can clone this repository and run the following commands:\n```bash\n$ git clone git@github.com:mitdbg/palimpzest.git\n$ cd palimpzest\n$ pip install .\n```\n\n## Join the PZ Community\nWe are actively hacking on PZ and would love to have you join our community [![Discord](https://img.shields.io/discord/1245561987480420445?logo=discord)](https://discord.gg/dN85JJ6jaH)\n\n[Our Discord server](https://discord.gg/dN85JJ6jaH) is the best place to:\n- Get help with your PZ program(s)\n- Give feedback to the maintainers\n- Discuss the future direction(s) of the project\n- Discuss anything related to data processing with LLMs!\n\nWe are eager to learn more about your workloads and use cases, and will take them into consideration in planning our future roadmap.\n\n## Quick Start\nThe easiest way to get started with Palimpzest is to run the `quickstart.ipynb` jupyter notebook. We demonstrate the full workflow of working with PZ, including registering a dataset, composing and executing a pipeline, and accessing the results.\nTo run the notebook, you can use the following command:\n```bash\n$ jupyter notebook\n```\nAnd then access the notebook from the jupyter interface in your browser at `localhost:8888`.\n\n### Even Quicker Start\nFor eager readers, the code in the notebook can be found in the following condensed snippet. However, we do suggest reading the notebook as it contains more insight into each element of the program.\n```python\nimport palimpzest as pz\n\n# define the fields we wish to compute\nemail_cols = [\n    {\"name\": \"sender\", \"type\": str, \"desc\": \"The email address of the sender\"},\n    {\"name\": \"subject\", \"type\": str, \"desc\": \"The subject of the email\"},\n    {\"name\": \"date\", \"type\": str, \"desc\": \"The date the email was sent\"},\n]\n\n# lazily construct the computation to get emails about holidays sent in July\ndataset = pz.Dataset(\"testdata/enron-tiny/\")\ndataset = dataset.sem_add_columns(email_cols)\ndataset = dataset.sem_filter(\"The email was sent in July\")\ndataset = dataset.sem_filter(\"The email is about holidays\")\n\n# execute the computation w/the MinCost policy\nconfig = pz.QueryProcessorConfig(policy=pz.MinCost(), verbose=True)\noutput = dataset.run(config)\n\n# display output (if using Jupyter, otherwise use print(output_df))\noutput_df = output.to_df(cols=[\"date\", \"sender\", \"subject\"])\ndisplay(output_df)\n```\n\n## Python Demos\nBelow are simple instructions to run PZ on a test data set of enron emails that is included with the system.\n\n### Downloading test data\nTo run the provided demos, you will need to download the test data. Due to the size of the data, we are unable to include it in the repository. You can download the test data by running the following command from a unix terminal (requires `wget` and `tar`):\n```\nchmod +x testdata/download-testdata.sh\n./testdata/download-testdata.sh\n```\n\n### Running the Demos\nSet your OpenAI (or Together.ai) api key at the command line:\n```bash\n# set one (or both) of the following:\nexport OPENAI_API_KEY=<your-api-key>\nexport TOGETHER_API_KEY=<your-api-key>\n```\n\nNow you can run the simple test program with:\n```bash\n$ python demos/simple-demo.py --task enron --dataset testdata/enron-eval-tiny --verbose\n```\n\n### Citation\nIf you would like to cite our work, please use the following citation:\n```\n@inproceedings{palimpzestCIDR,\n    title={Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing},\n    author={Liu, Chunwei and Russo, Matthew and Cafarella, Michael and Cao, Lei and Chen, Peter Baile and Chen, Zui and Franklin, Michael and Kraska, Tim and Madden, Samuel and Shahout, Rana and Vitagliano, Gerardo},\n    booktitle = {Proceedings of the {{Conference}} on {{Innovative Database Research}} ({{CIDR}})},\n    date = 2025,\n}\n```\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Palimpzest is a system which enables anyone to process AI-powered analytical queries simply by defining them in a declarative language",
    "version": "0.7.20",
    "project_urls": {
        "documentation": "https://palimpzest.org",
        "homepage": "https://palimpzest.org",
        "repository": "https://github.com/mitdbg/palimpzest/"
    },
    "split_keywords": [
        "relational",
        " optimization",
        " llm",
        " ai programming",
        " extraction",
        " tools",
        " document",
        " search",
        " integration"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "98e569a564fd9805090ba659f79e6d341c7bae7e7669a08e30bd09aaecb96da5",
                "md5": "a9e1699b5dd26cf1d6d6de93aab6fbed",
                "sha256": "d38e29c281b908e7801eb5ad2a089e0fd173a34317098fd94502cae64db4da4f"
            },
            "downloads": -1,
            "filename": "palimpzest-0.7.20-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a9e1699b5dd26cf1d6d6de93aab6fbed",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 192306,
            "upload_time": "2025-07-23T19:20:04",
            "upload_time_iso_8601": "2025-07-23T19:20:04.830258Z",
            "url": "https://files.pythonhosted.org/packages/98/e5/69a564fd9805090ba659f79e6d341c7bae7e7669a08e30bd09aaecb96da5/palimpzest-0.7.20-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "c45dcb4214acb28e8bf41fd4faade4995e933d26344a5db4bbff11dc4945d6f6",
                "md5": "c84290740822af6bd90196ce36c46dc8",
                "sha256": "94d194cdecd7397601b6d8ff74f0231a28cace5473d59c3eede3f793b7c089b3"
            },
            "downloads": -1,
            "filename": "palimpzest-0.7.20.tar.gz",
            "has_sig": false,
            "md5_digest": "c84290740822af6bd90196ce36c46dc8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 152865,
            "upload_time": "2025-07-23T19:20:06",
            "upload_time_iso_8601": "2025-07-23T19:20:06.142187Z",
            "url": "https://files.pythonhosted.org/packages/c4/5d/cb4214acb28e8bf41fd4faade4995e933d26344a5db4bbff11dc4945d6f6/palimpzest-0.7.20.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-23 19:20:06",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "mitdbg",
    "github_project": "palimpzest",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "palimpzest"
}

None