Name | docetl JSON |
Version |
0.2.1
JSON |
| download |
home_page | None |
Summary | ETL with LLM operations. |
upload_time | 2025-01-09 09:11:09 |
maintainer | None |
docs_url | None |
author | Shreya Shankar |
requires_python | <4.0,>=3.10 |
license | MIT |
keywords |
|
VCS |
|
bugtrack_url |
|
requirements |
No requirements were recorded.
|
Travis-CI |
No Travis.
|
coveralls test coverage |
No coveralls.
|
# 📜 DocETL: Powering Complex Document Processing Pipelines
[](https://docetl.org)
[](https://ucbepic.github.io/docetl)
[](https://discord.gg/fHp7B2X3xx)
[](https://arxiv.org/abs/2410.12189)

DocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers:
1. An interactive UI playground for iterative prompt engineering and pipeline development
2. A Python package for running production pipelines from the command line or Python code
### 🌟 Community Projects
- [Conversation Generator](https://github.com/PassionFruits-net/docetl-conversation)
- [Text-to-speech](https://github.com/PassionFruits-net/docetl-speaker)
- [YouTube Transcript Topic Extraction](https://github.com/rajib76/docetl_examples)
### 📚 Educational Resources
- [UI/UX Thoughts](https://x.com/sh_reya/status/1846235904664273201)
- [Using Gleaning to Improve Output Quality](https://x.com/sh_reya/status/1843354256335876262)
- [Deep Dive on Resolve Operator](https://x.com/sh_reya/status/1840796824636121288)
## 🚀 Getting Started
There are two main ways to use DocETL:
### 1. 🎮 DocWrangler, the Interactive UI Playground (Recommended for Development)
[DocWrangler](https://docetl.org/playground) helps you iteratively develop your pipeline:
- Experiment with different prompts and see results in real-time
- Build your pipeline step by step
- Export your finalized pipeline configuration for production use

DocWrangler is hosted at [docetl.org/playground](https://docetl.org/playground). But to run the playground locally, you can either:
- Use Docker (recommended for quick start): `make docker`
- Set up the development environment manually
See the [Playground Setup Guide](https://ucbepic.github.io/docetl/playground/) for detailed instructions.
### 2. 📦 Python Package (For Production Use)
If you want to use DocETL as a Python package:
#### Prerequisites
- Python 3.10 or later
- OpenAI API key
```bash
pip install docetl
```
Create a `.env` file in your project directory:
```bash
OPENAI_API_KEY=your_api_key_here # Required for LLM operations (or the key for the LLM of your choice)
```
To see examples of how to use DocETL, check out the [tutorial](https://ucbepic.github.io/docetl/tutorial/).
### 2. 🎮 DocWrangler Setup
To run DocWrangler locally, you have two options:
#### Option A: Using Docker (Recommended for Quick Start)
The easiest way to get the DocWrangler playground running:
1. Create the required environment files:
Create `.env` in the root directory:
```bash
OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
BACKEND_HOST=0.0.0.0
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000
```
Create `.env.local` in the `website` directory:
```bash
OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini
NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000
```
2. Run Docker:
```bash
make docker
```
This will:
- Create a Docker volume for persistent data
- Build the DocETL image
- Run the container with the UI accessible at http://localhost:3000
To clean up Docker resources (note that this will delete the Docker volume):
```bash
make docker-clean
```
#### Option B: Manual Setup (Development)
For development or if you prefer not to use Docker:
1. Clone the repository:
```bash
git clone https://github.com/ucbepic/docetl.git
cd docetl
```
2. Set up environment variables in `.env` in the root/top-level directory:
```bash
OPENAI_API_KEY=your_api_key_here
BACKEND_ALLOW_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
BACKEND_HOST=localhost
BACKEND_PORT=8000
BACKEND_RELOAD=True
FRONTEND_HOST=0.0.0.0
FRONTEND_PORT=3000
```
And create an .env.local file in the `website` directory with the following:
```bash
OPENAI_API_KEY=sk-xxx
OPENAI_API_BASE=https://api.openai.com/v1
MODEL_NAME=gpt-4o-mini
NEXT_PUBLIC_BACKEND_HOST=localhost
NEXT_PUBLIC_BACKEND_PORT=8000
```
3. Install dependencies:
```bash
make install # Install Python package
make install-ui # Install UI dependencies
```
Note that the OpenAI API key, base, and model name are for the UI assistant only; not the DocETL pipeline execution engine.
4. Start the development server:
```bash
make run-ui-dev
```
5. Visit http://localhost:3000/playground to access the interactive UI.
### 🛠️ Development Setup
If you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:
```bash
make tests-basic # Runs basic test suite (costs < $0.01 with OpenAI)
```
For detailed documentation and tutorials, visit our [documentation](https://ucbepic.github.io/docetl).
Raw data
{
"_id": null,
"home_page": null,
"name": "docetl",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.10",
"maintainer_email": null,
"keywords": null,
"author": "Shreya Shankar",
"author_email": "shreyashankar@berkeley.edu",
"download_url": "https://files.pythonhosted.org/packages/49/37/cd5181624182be1826d878da60f38ef8efe0c35f24a9127c3217b8322204/docetl-0.2.1.tar.gz",
"platform": null,
"description": "# \ud83d\udcdc DocETL: Powering Complex Document Processing Pipelines\n\n[](https://docetl.org)\n[](https://ucbepic.github.io/docetl)\n[](https://discord.gg/fHp7B2X3xx)\n[](https://arxiv.org/abs/2410.12189)\n\n\n\nDocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers:\n\n1. An interactive UI playground for iterative prompt engineering and pipeline development\n2. A Python package for running production pipelines from the command line or Python code\n\n### \ud83c\udf1f Community Projects\n\n- [Conversation Generator](https://github.com/PassionFruits-net/docetl-conversation)\n- [Text-to-speech](https://github.com/PassionFruits-net/docetl-speaker)\n- [YouTube Transcript Topic Extraction](https://github.com/rajib76/docetl_examples)\n\n### \ud83d\udcda Educational Resources\n\n- [UI/UX Thoughts](https://x.com/sh_reya/status/1846235904664273201)\n- [Using Gleaning to Improve Output Quality](https://x.com/sh_reya/status/1843354256335876262)\n- [Deep Dive on Resolve Operator](https://x.com/sh_reya/status/1840796824636121288)\n\n\n## \ud83d\ude80 Getting Started\n\nThere are two main ways to use DocETL:\n\n### 1. \ud83c\udfae DocWrangler, the Interactive UI Playground (Recommended for Development)\n\n[DocWrangler](https://docetl.org/playground) helps you iteratively develop your pipeline:\n- Experiment with different prompts and see results in real-time\n- Build your pipeline step by step\n- Export your finalized pipeline configuration for production use\n\n\n\nDocWrangler is hosted at [docetl.org/playground](https://docetl.org/playground). But to run the playground locally, you can either:\n- Use Docker (recommended for quick start): `make docker`\n- Set up the development environment manually\n\nSee the [Playground Setup Guide](https://ucbepic.github.io/docetl/playground/) for detailed instructions.\n\n### 2. \ud83d\udce6 Python Package (For Production Use)\n\nIf you want to use DocETL as a Python package:\n\n#### Prerequisites\n- Python 3.10 or later\n- OpenAI API key\n\n```bash\npip install docetl\n```\n\nCreate a `.env` file in your project directory:\n```bash\nOPENAI_API_KEY=your_api_key_here # Required for LLM operations (or the key for the LLM of your choice)\n```\n\nTo see examples of how to use DocETL, check out the [tutorial](https://ucbepic.github.io/docetl/tutorial/).\n\n### 2. \ud83c\udfae DocWrangler Setup\n\nTo run DocWrangler locally, you have two options:\n\n#### Option A: Using Docker (Recommended for Quick Start)\n\nThe easiest way to get the DocWrangler playground running:\n\n1. Create the required environment files:\n\nCreate `.env` in the root directory:\n```bash\nOPENAI_API_KEY=your_api_key_here\nBACKEND_ALLOW_ORIGINS=http://localhost:3000,http://127.0.0.1:3000\nBACKEND_HOST=0.0.0.0\nBACKEND_PORT=8000\nBACKEND_RELOAD=True\nFRONTEND_HOST=0.0.0.0\nFRONTEND_PORT=3000\n```\n\nCreate `.env.local` in the `website` directory:\n```bash\nOPENAI_API_KEY=sk-xxx\nOPENAI_API_BASE=https://api.openai.com/v1\nMODEL_NAME=gpt-4o-mini\n\nNEXT_PUBLIC_BACKEND_HOST=localhost\nNEXT_PUBLIC_BACKEND_PORT=8000\n```\n\n2. Run Docker:\n```bash\nmake docker\n```\n\nThis will:\n- Create a Docker volume for persistent data\n- Build the DocETL image\n- Run the container with the UI accessible at http://localhost:3000\n\nTo clean up Docker resources (note that this will delete the Docker volume):\n```bash\nmake docker-clean\n```\n\n#### Option B: Manual Setup (Development)\n\nFor development or if you prefer not to use Docker:\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/ucbepic/docetl.git\ncd docetl\n```\n\n2. Set up environment variables in `.env` in the root/top-level directory:\n```bash\nOPENAI_API_KEY=your_api_key_here\nBACKEND_ALLOW_ORIGINS=http://localhost:3000,http://127.0.0.1:3000\nBACKEND_HOST=localhost\nBACKEND_PORT=8000\nBACKEND_RELOAD=True\nFRONTEND_HOST=0.0.0.0\nFRONTEND_PORT=3000\n```\n\nAnd create an .env.local file in the `website` directory with the following:\n```bash\nOPENAI_API_KEY=sk-xxx\nOPENAI_API_BASE=https://api.openai.com/v1\nMODEL_NAME=gpt-4o-mini\n\nNEXT_PUBLIC_BACKEND_HOST=localhost\nNEXT_PUBLIC_BACKEND_PORT=8000\n```\n\n3. Install dependencies:\n```bash\nmake install # Install Python package\nmake install-ui # Install UI dependencies\n```\n\nNote that the OpenAI API key, base, and model name are for the UI assistant only; not the DocETL pipeline execution engine.\n\n4. Start the development server:\n```bash\nmake run-ui-dev\n```\n\n5. Visit http://localhost:3000/playground to access the interactive UI.\n\n### \ud83d\udee0\ufe0f Development Setup\n\nIf you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:\n\n```bash\nmake tests-basic # Runs basic test suite (costs < $0.01 with OpenAI)\n```\n\nFor detailed documentation and tutorials, visit our [documentation](https://ucbepic.github.io/docetl).\n\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "ETL with LLM operations.",
"version": "0.2.1",
"project_urls": null,
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "159b028222a50ab9818cd9cae796e6f4c509e2d225ec86bf814a3e33fbad2d8e",
"md5": "3b147e5551716a04eeb45185e0320211",
"sha256": "d0fdb8487883accf09754495239d1c5d132e84a245f5367a9b218539407a3bf6"
},
"downloads": -1,
"filename": "docetl-0.2.1-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3b147e5551716a04eeb45185e0320211",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.10",
"size": 170956,
"upload_time": "2025-01-09T09:11:06",
"upload_time_iso_8601": "2025-01-09T09:11:06.254009Z",
"url": "https://files.pythonhosted.org/packages/15/9b/028222a50ab9818cd9cae796e6f4c509e2d225ec86bf814a3e33fbad2d8e/docetl-0.2.1-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4937cd5181624182be1826d878da60f38ef8efe0c35f24a9127c3217b8322204",
"md5": "abd79e0c374b93d856ddd09d12b64d28",
"sha256": "836174ba94259f9fd4eae0f1b7082f0ad87008e0406d0a48a827f4ce79c870a4"
},
"downloads": -1,
"filename": "docetl-0.2.1.tar.gz",
"has_sig": false,
"md5_digest": "abd79e0c374b93d856ddd09d12b64d28",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.10",
"size": 144629,
"upload_time": "2025-01-09T09:11:09",
"upload_time_iso_8601": "2025-01-09T09:11:09.036806Z",
"url": "https://files.pythonhosted.org/packages/49/37/cd5181624182be1826d878da60f38ef8efe0c35f24a9127c3217b8322204/docetl-0.2.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-09 09:11:09",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "docetl"
}