pixeltable


Namepixeltable JSON
Version 0.2.28 PyPI version JSON
download
home_pagehttps://pixeltable.com/
SummaryAI Data Infrastructure: Declarative, Multimodal, and Incremental
upload_time2024-12-11 01:42:33
maintainerNone
docs_urlNone
authorPixeltable, Inc.
requires_python<4.0,>=3.9
licenseApache-2.0
keywords data-science machine-learning database ai computer-vision chatbot ml artificial-intelligence feature-engineering multimodal mlops feature-store vector-database llm genai
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            <div align="center">
<img src="https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png"
     alt="Pixeltable" width="50%" />
<br></br>

<h2>AI Data Infrastructure β€” Declarative, Multimodal, and Incremental</h2>

[![License](https://img.shields.io/badge/License-Apache%202.0-0530AD.svg)](https://opensource.org/licenses/Apache-2.0)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pixeltable?logo=python&logoColor=white&)
![Platform Support](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-E5DDD4)
<br>
[![tests status](https://github.com/pixeltable/pixeltable/actions/workflows/pytest.yml/badge.svg)](https://github.com/pixeltable/pixeltable/actions/workflows/pytest.yml)
[![tests status](https://github.com/pixeltable/pixeltable/actions/workflows/nightly.yml/badge.svg)](https://github.com/pixeltable/pixeltable/actions/workflows/nightly.yml)
[![PyPI Package](https://img.shields.io/pypi/v/pixeltable?color=4D148C)](https://pypi.org/project/pixeltable/)
[![My Discord (1306431018890166272)](https://img.shields.io/badge/πŸ’¬-Discord-%235865F2.svg)](https://discord.gg/QPyqFYx2UN)
<a target="_blank" href="https://huggingface.co/Pixeltable">
  <img src="https://img.shields.io/badge/πŸ€—-HF Space-FF7D04" alt="Visit our Hugging Face space"/>
</a>

[Installation](https://docs.pixeltable.com/docs/installation) |
[Documentation](https://pixeltable.readme.io/) |
[API Reference](https://pixeltable.github.io/pixeltable/) |
[Code Samples](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#-code-samples) |
[Computer Vision](https://docs.pixeltable.com/docs/object-detection-in-videos) |
[LLM](https://docs.pixeltable.com/docs/document-indexing-and-rag)
</div>

Pixeltable is a Python library providing a declarative interface for multimodal data (text, images, audio, video).
It features built-in versioning, lineage tracking, and incremental updates, enabling users to **store**, **transform**,
**index**, and **iterate** on data for their ML workflows.

Data transformations, model inference, and custom logic are embedded as **computed columns**.

- **Load/Query all data types**: Interact with
    [video data](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#import-media-data-into-pixeltable-videos-images-audio)
    at the [frame level](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#text-and-image-similarity-search-on-video-frames-with-embedding-indexes)
    and documents at the [chunk level](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#automate-data-operations-with-views-eg-split-documents-into-chunks)
- **Incremental updates for data transformation**: Maintain an
    [embedding index](https://docs.pixeltable.com/docs/embedding-vector-indexes) colocated with your data
- **Lazy evaluation and cache management**: Eliminates the need for
    [manual frame extraction](https://docs.pixeltable.com/docs/object-detection-in-videos)
- **Integrates with any Python libraries**: Use
    [built-in and custom functions (UDFs)](https://docs.pixeltable.com/docs/user-defined-functions-udfs)
    without complex pipelines
- **Data format agnostic and extensibility**: Access tables as Parquet files,
    [PyTorch datasets](https://pixeltable.github.io/pixeltable/api/data-frame/#pixeltable.DataFrame.to_pytorch_dataset),
    or [COCO annotations](https://pixeltable.github.io/pixeltable/api/table/#pixeltable.Table.to_coco_dataset)

## πŸ’Ύ Installation

```python
pip install pixeltable
```

**Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.**

## πŸ’‘ Getting Started

Learn how to create tables, populate them with data, and enhance them with built-in or user-defined transformations.

| Topic | Notebook | Topic | Notebook |
|:----------|:-----------------|:-------------------------|:---------------------------------:|
| 10-Minute Tour of Pixeltable    | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/pixeltable-basics.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> | Tables and Data Operations    | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/fundamentals/tables-and-data-operations.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> |
| User-Defined Functions (UDFs)    | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/udfs-in-pixeltable.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> | Object Detection Models | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/use-cases/object-detection-in-videos.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> |
| Incremental Prompt Engineering | <a target="_blank" href="https://colab.research.google.com/github/mistralai/cookbook/blob/main/third_party/Pixeltable/incremental_prompt_engineering_and_model_comparison.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Github"/> | Working with External Files    | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/working-with-external-files.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a> |
| Integrating with Label Studio    | <a target="_blank" href="https://pixeltable.readme.io/docs/label-studio"> <img src="https://img.shields.io/badge/πŸ“š Documentation-013056" alt="Visit our documentation"/></a> | Audio/Video Transcript Indexing    | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/use-cases/audio-transcriptions.ipynb">  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
| Multimodal Application    | <a target="_blank" href="https://huggingface.co/spaces/Pixeltable/Multimodal-Powerhouse"> <img src="https://img.shields.io/badge/πŸ€—-Gradio App-FF7D04" alt="Visit our Hugging Face Space"/></a> | Document Indexing and RAG    | <a target="_blank" href="https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/use-cases/rag-demo.ipynb">  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
| Context-Aware Discord Bot    | <a target="_blank" href="https://github.com/pixeltable/pixeltable/blob/main/docs/sample-apps/context-aware-discord-bot"> <img src="https://img.shields.io/badge/%F0%9F%92%AC-Discord Bot-%235865F2.svg" alt="Visit our documentation"/></a> | Image/Text Similarity Search  | <a target="_blank" href="https://github.com/pixeltable/pixeltable/tree/main/docs/sample-apps/text-and-image-similarity-search-nextjs-fastapi">  <img src="https://img.shields.io/badge/πŸ–₯️-Next.js + FastAPI-black.svg" alt="Open In Colab"/> |

## 🧱 Code Samples

### Import media data into Pixeltable (videos, images, audio...)

```python
import pixeltable as pxt

v = pxt.create_table('external_data.videos', {'video': pxt.Video})

prefix = 's3://multimedia-commons/'
paths = [
    'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',
    'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',
    'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'
]
v.insert({'video': prefix + p} for p in paths)
```

Learn how to [work with data in Pixeltable](https://pixeltable.readme.io/docs/working-with-external-files).

### Object detection in images using DETR model

```python
import pixeltable as pxt
from pixeltable.functions import huggingface

# Create a table to store data persistently
t = pxt.create_table('image', {'image': pxt.Image})

# Insert some images
prefix = 'https://upload.wikimedia.org/wikipedia/commons'
paths = [
    '/1/15/Cat_August_2010-4.jpg',
    '/e/e1/Example_of_a_Dog.jpg',
    '/thumb/b/bf/Bird_Diversity_2013.png/300px-Bird_Diversity_2013.png'
]
t.insert({'image': prefix + p} for p in paths)

# Add a computed column for image classification
t.add_computed_column(classification=huggingface.detr_for_object_detection(
    t.image,
    model_id='facebook/detr-resnet-50'
))

# Retrieve the rows where cats have been identified
t.select(animal = t.image,
         classification = t.classification.label_text[0]) \
.where(t.classification.label_text[0]=='cat').head()
```

Learn about computed columns and object detection:
[Comparing object detection models](https://pixeltable.readme.io/docs/object-detection-in-videos).

### Extend Pixeltable's capabilities with user-defined functions

```python
@pxt.udf
def draw_boxes(img: PIL.Image.Image, boxes: list[list[float]]) -> PIL.Image.Image:
    result = img.copy()  # Create a copy of `img`
    d = PIL.ImageDraw.Draw(result)
    for box in boxes:
        d.rectangle(box, width=3)  # Draw bounding box rectangles on the copied image
    return result
```

Learn more about user-defined functions:
[UDFs in Pixeltable](https://pixeltable.readme.io/docs/user-defined-functions-udfs).

### Automate data operations with views, e.g., split documents into chunks

```python
# In this example, the view is defined by iteration over the chunks of a DocumentSplitter
chunks_table = pxt.create_view(
    'rag_demo.chunks',
    documents_table,
    iterator=DocumentSplitter.create(
        document=documents_table.document,
        separators='token_limit', limit=300)
)
```

Learn how to leverage views to build your
[RAG workflow](https://pixeltable.readme.io/docs/document-indexing-and-rag).

### Evaluate model performance

```python
# The computation of the mAP metric can become a query over the evaluation output
frames_view.select(mean_ap(frames_view.eval_yolox_tiny), mean_ap(frames_view.eval_yolox_m)).show()
```

Learn how to leverage Pixeltable for [Model analytics](https://pixeltable.readme.io/docs/object-detection-in-videos).

### Working with inference services

```python
chat_table = pxt.create_table('together_demo.chat', {'input': pxt.String})

# The chat-completions API expects JSON-formatted input:
messages = [{'role': 'user', 'content': chat_table.input}]

# This example shows how additional parameters from the Together API can be used in Pixeltable
chat_table.add_computed_column(
    output=chat_completions(
        messages=messages,
        model='mistralai/Mixtral-8x7B-Instruct-v0.1',
        max_tokens=300,
        stop=['\n'],
        temperature=0.7,
        top_p=0.9,
        top_k=40,
        repetition_penalty=1.1,
        logprobs=1,
        echo=True
    )
)
chat_table.add_computed_column(
    response=chat_table.output.choices[0].message.content
)

# Start a conversation
chat_table.insert([
    {'input': 'How many species of felids have been classified?'},
    {'input': 'Can you make me a coffee?'}
])
chat_table.select(chat_table.input, chat_table.response).head()
```

Learn how to interact with inference services such as [Together AI](https://pixeltable.readme.io/docs/together-ai) in Pixeltable.

### Text and image similarity search on video frames with embedding indexes

```python
import pixeltable as pxt
from pixeltable.functions.huggingface import clip_image, clip_text
from pixeltable.iterators import FrameIterator
import PIL.Image

video_table = pxt.create_table('videos', {'video': pxt.Video})

video_table.insert([{'video': '/video.mp4'}])

frames_view = pxt.create_view(
    'frames', video_table, iterator=FrameIterator.create(video=video_table.video))

@pxt.expr_udf
def embed_image(img: PIL.Image.Image):
    return clip_image(img, model_id='openai/clip-vit-base-patch32')

@pxt.expr_udf
def str_embed(s: str):
    return clip_text(s, model_id='openai/clip-vit-base-patch32')

# Create an index on the 'frame' column that allows text and image search
frames_view.add_embedding_index('frame', string_embed=str_embed, image_embed=embed_image)

# Now we will retrieve images based on a sample image
sample_image = '/image.jpeg'
sim = frames_view.frame.similarity(sample_image)
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()

# Now we will retrieve images based on a string
sample_text = 'red truck'
sim = frames_view.frame.similarity(sample_text)
frames_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()
```

Learn how to work with [Embedding and Vector Indexes](https://docs.pixeltable.com/docs/embedding-vector-indexes).

## πŸ”„ AI Stack Comparison

### 🎯 Computer Vision Workflows

| Requirement | Traditional | Pixeltable |
|-------------|---------------------|------------|
| Frame Extraction | ffmpeg + custom code | Automatic via FrameIterator |
| Object Detection | Multiple scripts + caching | Single computed column |
| Video Indexing | Custom pipelines + Vector DB | Native similarity search |
| Annotation Management | Separate tools + custom code | Label Studio integration |
| Model Evaluation | Custom metrics pipeline | Built-in mAP computation |

### πŸ€– LLM Workflows

| Requirement | Traditional | Pixeltable |
|-------------|---------------------|------------|
| Document Chunking | Tool + custom code | Native DocumentSplitter |
| Embedding Generation | Separate pipeline + caching | Computed columns |
| Vector Search | External vector DB | Built-in vector indexing |
| Prompt Management | Custom tracking solution | Version-controlled columns |
| Chain Management | Tool + custom code | Computed column DAGs |

### 🎨 Multimodal Workflows

| Requirement | Traditional | Pixeltable |
|-------------|---------------------|------------|
| Data Types | Multiple storage systems | Unified table interface |
| Cross-Modal Search | Complex integration | Native similarity support |
| Pipeline Orchestration | Multiple tools (Airflow, etc.) | Single declarative interface |
| Asset Management | Custom tracking system | Automatic lineage |
| Quality Control | Multiple validation tools | Computed validation columns |

## ❓ FAQ

### What is Pixeltable?

Pixeltable unifies data storage, versioning, and indexing with orchestration and model versioning under a declarative
table interface, with transformations, model inference, and custom logic represented as computed columns.

### What problems does Pixeltable solve?

Today's solutions for AI app development require extensive custom coding and infrastructure plumbing.
Tracking lineage and versions between and across data transformations, models, and deployments is cumbersome.
Pixeltable lets ML Engineers and Data Scientists focus on exploration, modeling, and app development without
dealing with the customary data plumbing.

### What does Pixeltable provide me with? Pixeltable provides:

- Data storage and versioning
- Combined Data and Model Lineage
- Indexing (e.g. embedding vectors) and Data Retrieval
- Orchestration of multimodal workloads
- Incremental updates
- Code is automatically production-ready

### Why should you use Pixeltable?

- **It gives you transparency and reproducibility**
  - All generated data is automatically recorded and versioned
  - You will never need to re-run a workload because you lost track of the input data
- **It saves you money**
  - All data changes are automatically incremental
  - You never need to re-run pipelines from scratch because you’re adding data
- **It integrates with any existing Python code or libraries**
  - Bring your ever-changing code and workloads
  - You choose the models, tools, and AI practices (e.g., your embedding model for a vector index);
    Pixeltable orchestrates the data

### What is Pixeltable not providing?

- Pixeltable is not a low-code, prescriptive AI solution. We empower you to use the best frameworks and techniques for
  your specific needs.
- We do not aim to replace your existing AI toolkit, but rather enhance it by streamlining the underlying data
  infrastructure and orchestration.

> [!TIP]
> Check out the [Integrations](https://pixeltable.readme.io/docs/working-with-openai) section, and feel free to submit
> a request for additional ones.

## 🀝 Contributing to Pixeltable

We're excited to welcome contributions from the community! Here's how you can get involved:

### πŸ› Report Issues

- Found a bug? [Open an issue](https://github.com/pixeltable/pixeltable/issues)
- Include steps to reproduce and environment details

### πŸ’‘ Submit Changes

- Fork the repository
- Create a feature branch
- Submit a [pull request](https://github.com/pixeltable/pixeltable/pulls)
- See our [Contributing Guide](CONTRIBUTING.md) for detailed instructions

### πŸ’¬ Join the Discussion

- Have questions? Start a [Discussion](https://github.com/pixeltable/pixeltable/discussions)
- Share your Pixeltable projects and use cases
- Help others in the community

### πŸ“ Improve Documentation

- Suggest examples and tutorials
- Propose improvements

## 🏒 License

This library is licensed under the Apache 2.0 License.

            

Raw data

            {
    "_id": null,
    "home_page": "https://pixeltable.com/",
    "name": "pixeltable",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "data-science, machine-learning, database, ai, computer-vision, chatbot, ml, artificial-intelligence, feature-engineering, multimodal, mlops, feature-store, vector-database, llm, genai",
    "author": "Pixeltable, Inc.",
    "author_email": "contact@pixeltable.com",
    "download_url": "https://files.pythonhosted.org/packages/89/30/5fb745df7a071a6d074257e02743388c92fc3da6599244307ab9629e165b/pixeltable-0.2.28.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n<img src=\"https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/pixeltable-logo-large.png\"\n     alt=\"Pixeltable\" width=\"50%\" />\n<br></br>\n\n<h2>AI Data Infrastructure \u2014 Declarative, Multimodal, and Incremental</h2>\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-0530AD.svg)](https://opensource.org/licenses/Apache-2.0)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pixeltable?logo=python&logoColor=white&)\n![Platform Support](https://img.shields.io/badge/platform-Linux%20%7C%20macOS%20%7C%20Windows-E5DDD4)\n<br>\n[![tests status](https://github.com/pixeltable/pixeltable/actions/workflows/pytest.yml/badge.svg)](https://github.com/pixeltable/pixeltable/actions/workflows/pytest.yml)\n[![tests status](https://github.com/pixeltable/pixeltable/actions/workflows/nightly.yml/badge.svg)](https://github.com/pixeltable/pixeltable/actions/workflows/nightly.yml)\n[![PyPI Package](https://img.shields.io/pypi/v/pixeltable?color=4D148C)](https://pypi.org/project/pixeltable/)\n[![My Discord (1306431018890166272)](https://img.shields.io/badge/\ud83d\udcac-Discord-%235865F2.svg)](https://discord.gg/QPyqFYx2UN)\n<a target=\"_blank\" href=\"https://huggingface.co/Pixeltable\">\n  <img src=\"https://img.shields.io/badge/\ud83e\udd17-HF Space-FF7D04\" alt=\"Visit our Hugging Face space\"/>\n</a>\n\n[Installation](https://docs.pixeltable.com/docs/installation) |\n[Documentation](https://pixeltable.readme.io/) |\n[API Reference](https://pixeltable.github.io/pixeltable/) |\n[Code Samples](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#-code-samples) |\n[Computer Vision](https://docs.pixeltable.com/docs/object-detection-in-videos) |\n[LLM](https://docs.pixeltable.com/docs/document-indexing-and-rag)\n</div>\n\nPixeltable is a Python library providing a declarative interface for multimodal data (text, images, audio, video).\nIt features built-in versioning, lineage tracking, and incremental updates, enabling users to **store**, **transform**,\n**index**, and **iterate** on data for their ML workflows.\n\nData transformations, model inference, and custom logic are embedded as **computed columns**.\n\n- **Load/Query all data types**: Interact with\n    [video data](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#import-media-data-into-pixeltable-videos-images-audio)\n    at the [frame level](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#text-and-image-similarity-search-on-video-frames-with-embedding-indexes)\n    and documents at the [chunk level](https://github.com/pixeltable/pixeltable?tab=readme-ov-file#automate-data-operations-with-views-eg-split-documents-into-chunks)\n- **Incremental updates for data transformation**: Maintain an\n    [embedding index](https://docs.pixeltable.com/docs/embedding-vector-indexes) colocated with your data\n- **Lazy evaluation and cache management**: Eliminates the need for\n    [manual frame extraction](https://docs.pixeltable.com/docs/object-detection-in-videos)\n- **Integrates with any Python libraries**: Use\n    [built-in and custom functions (UDFs)](https://docs.pixeltable.com/docs/user-defined-functions-udfs)\n    without complex pipelines\n- **Data format agnostic and extensibility**: Access tables as Parquet files,\n    [PyTorch datasets](https://pixeltable.github.io/pixeltable/api/data-frame/#pixeltable.DataFrame.to_pytorch_dataset),\n    or [COCO annotations](https://pixeltable.github.io/pixeltable/api/table/#pixeltable.Table.to_coco_dataset)\n\n## \ud83d\udcbe Installation\n\n```python\npip install pixeltable\n```\n\n**Pixeltable is persistent. Unlike in-memory Python libraries such as Pandas, Pixeltable is a database.**\n\n## \ud83d\udca1 Getting Started\n\nLearn how to create tables, populate them with data, and enhance them with built-in or user-defined transformations.\n\n| Topic | Notebook | Topic | Notebook |\n|:----------|:-----------------|:-------------------------|:---------------------------------:|\n| 10-Minute Tour of Pixeltable    | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/pixeltable-basics.ipynb\"> <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> </a> | Tables and Data Operations    | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/fundamentals/tables-and-data-operations.ipynb\"> <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> </a> |\n| User-Defined Functions (UDFs)    | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/udfs-in-pixeltable.ipynb\"> <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> </a> | Object Detection Models | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/use-cases/object-detection-in-videos.ipynb\"> <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> </a> |\n| Incremental Prompt Engineering | <a target=\"_blank\" href=\"https://colab.research.google.com/github/mistralai/cookbook/blob/main/third_party/Pixeltable/incremental_prompt_engineering_and_model_comparison.ipynb\"> <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Github\"/> | Working with External Files    | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/feature-guides/working-with-external-files.ipynb\"> <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> </a> |\n| Integrating with Label Studio    | <a target=\"_blank\" href=\"https://pixeltable.readme.io/docs/label-studio\"> <img src=\"https://img.shields.io/badge/\ud83d\udcda Documentation-013056\" alt=\"Visit our documentation\"/></a> | Audio/Video Transcript Indexing    | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/use-cases/audio-transcriptions.ipynb\">  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> |\n| Multimodal Application    | <a target=\"_blank\" href=\"https://huggingface.co/spaces/Pixeltable/Multimodal-Powerhouse\"> <img src=\"https://img.shields.io/badge/\ud83e\udd17-Gradio App-FF7D04\" alt=\"Visit our Hugging Face Space\"/></a> | Document Indexing and RAG    | <a target=\"_blank\" href=\"https://colab.research.google.com/github/pixeltable/pixeltable/blob/release/docs/notebooks/use-cases/rag-demo.ipynb\">  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/> |\n| Context-Aware Discord Bot    | <a target=\"_blank\" href=\"https://github.com/pixeltable/pixeltable/blob/main/docs/sample-apps/context-aware-discord-bot\"> <img src=\"https://img.shields.io/badge/%F0%9F%92%AC-Discord Bot-%235865F2.svg\" alt=\"Visit our documentation\"/></a> | Image/Text Similarity Search  | <a target=\"_blank\" href=\"https://github.com/pixeltable/pixeltable/tree/main/docs/sample-apps/text-and-image-similarity-search-nextjs-fastapi\">  <img src=\"https://img.shields.io/badge/\ud83d\udda5\ufe0f-Next.js + FastAPI-black.svg\" alt=\"Open In Colab\"/> |\n\n## \ud83e\uddf1 Code Samples\n\n### Import media data into Pixeltable (videos, images, audio...)\n\n```python\nimport pixeltable as pxt\n\nv = pxt.create_table('external_data.videos', {'video': pxt.Video})\n\nprefix = 's3://multimedia-commons/'\npaths = [\n    'data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4',\n    'data/videos/mp4/ffe/feb/ffefebb41485539f964760e6115fbc44.mp4',\n    'data/videos/mp4/ffe/f73/ffef7384d698b5f70d411c696247169.mp4'\n]\nv.insert({'video': prefix + p} for p in paths)\n```\n\nLearn how to [work with data in Pixeltable](https://pixeltable.readme.io/docs/working-with-external-files).\n\n### Object detection in images using DETR model\n\n```python\nimport pixeltable as pxt\nfrom pixeltable.functions import huggingface\n\n# Create a table to store data persistently\nt = pxt.create_table('image', {'image': pxt.Image})\n\n# Insert some images\nprefix = 'https://upload.wikimedia.org/wikipedia/commons'\npaths = [\n    '/1/15/Cat_August_2010-4.jpg',\n    '/e/e1/Example_of_a_Dog.jpg',\n    '/thumb/b/bf/Bird_Diversity_2013.png/300px-Bird_Diversity_2013.png'\n]\nt.insert({'image': prefix + p} for p in paths)\n\n# Add a computed column for image classification\nt.add_computed_column(classification=huggingface.detr_for_object_detection(\n    t.image,\n    model_id='facebook/detr-resnet-50'\n))\n\n# Retrieve the rows where cats have been identified\nt.select(animal = t.image,\n         classification = t.classification.label_text[0]) \\\n.where(t.classification.label_text[0]=='cat').head()\n```\n\nLearn about computed columns and object detection:\n[Comparing object detection models](https://pixeltable.readme.io/docs/object-detection-in-videos).\n\n### Extend Pixeltable's capabilities with user-defined functions\n\n```python\n@pxt.udf\ndef draw_boxes(img: PIL.Image.Image, boxes: list[list[float]]) -> PIL.Image.Image:\n    result = img.copy()  # Create a copy of `img`\n    d = PIL.ImageDraw.Draw(result)\n    for box in boxes:\n        d.rectangle(box, width=3)  # Draw bounding box rectangles on the copied image\n    return result\n```\n\nLearn more about user-defined functions:\n[UDFs in Pixeltable](https://pixeltable.readme.io/docs/user-defined-functions-udfs).\n\n### Automate data operations with views, e.g., split documents into chunks\n\n```python\n# In this example, the view is defined by iteration over the chunks of a DocumentSplitter\nchunks_table = pxt.create_view(\n    'rag_demo.chunks',\n    documents_table,\n    iterator=DocumentSplitter.create(\n        document=documents_table.document,\n        separators='token_limit', limit=300)\n)\n```\n\nLearn how to leverage views to build your\n[RAG workflow](https://pixeltable.readme.io/docs/document-indexing-and-rag).\n\n### Evaluate model performance\n\n```python\n# The computation of the mAP metric can become a query over the evaluation output\nframes_view.select(mean_ap(frames_view.eval_yolox_tiny), mean_ap(frames_view.eval_yolox_m)).show()\n```\n\nLearn how to leverage Pixeltable for [Model analytics](https://pixeltable.readme.io/docs/object-detection-in-videos).\n\n### Working with inference services\n\n```python\nchat_table = pxt.create_table('together_demo.chat', {'input': pxt.String})\n\n# The chat-completions API expects JSON-formatted input:\nmessages = [{'role': 'user', 'content': chat_table.input}]\n\n# This example shows how additional parameters from the Together API can be used in Pixeltable\nchat_table.add_computed_column(\n    output=chat_completions(\n        messages=messages,\n        model='mistralai/Mixtral-8x7B-Instruct-v0.1',\n        max_tokens=300,\n        stop=['\\n'],\n        temperature=0.7,\n        top_p=0.9,\n        top_k=40,\n        repetition_penalty=1.1,\n        logprobs=1,\n        echo=True\n    )\n)\nchat_table.add_computed_column(\n    response=chat_table.output.choices[0].message.content\n)\n\n# Start a conversation\nchat_table.insert([\n    {'input': 'How many species of felids have been classified?'},\n    {'input': 'Can you make me a coffee?'}\n])\nchat_table.select(chat_table.input, chat_table.response).head()\n```\n\nLearn how to interact with inference services such as [Together AI](https://pixeltable.readme.io/docs/together-ai) in Pixeltable.\n\n### Text and image similarity search on video frames with embedding indexes\n\n```python\nimport pixeltable as pxt\nfrom pixeltable.functions.huggingface import clip_image, clip_text\nfrom pixeltable.iterators import FrameIterator\nimport PIL.Image\n\nvideo_table = pxt.create_table('videos', {'video': pxt.Video})\n\nvideo_table.insert([{'video': '/video.mp4'}])\n\nframes_view = pxt.create_view(\n    'frames', video_table, iterator=FrameIterator.create(video=video_table.video))\n\n@pxt.expr_udf\ndef embed_image(img: PIL.Image.Image):\n    return clip_image(img, model_id='openai/clip-vit-base-patch32')\n\n@pxt.expr_udf\ndef str_embed(s: str):\n    return clip_text(s, model_id='openai/clip-vit-base-patch32')\n\n# Create an index on the 'frame' column that allows text and image search\nframes_view.add_embedding_index('frame', string_embed=str_embed, image_embed=embed_image)\n\n# Now we will retrieve images based on a sample image\nsample_image = '/image.jpeg'\nsim = frames_view.frame.similarity(sample_image)\nframes_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()\n\n# Now we will retrieve images based on a string\nsample_text = 'red truck'\nsim = frames_view.frame.similarity(sample_text)\nframes_view.order_by(sim, asc=False).limit(5).select(frames_view.frame, sim=sim).collect()\n```\n\nLearn how to work with [Embedding and Vector Indexes](https://docs.pixeltable.com/docs/embedding-vector-indexes).\n\n## \ud83d\udd04 AI Stack Comparison\n\n### \ud83c\udfaf Computer Vision Workflows\n\n| Requirement | Traditional | Pixeltable |\n|-------------|---------------------|------------|\n| Frame Extraction | ffmpeg + custom code | Automatic via FrameIterator |\n| Object Detection | Multiple scripts + caching | Single computed column |\n| Video Indexing | Custom pipelines + Vector DB | Native similarity search |\n| Annotation Management | Separate tools + custom code | Label Studio integration |\n| Model Evaluation | Custom metrics pipeline | Built-in mAP computation |\n\n### \ud83e\udd16 LLM Workflows\n\n| Requirement | Traditional | Pixeltable |\n|-------------|---------------------|------------|\n| Document Chunking | Tool + custom code | Native DocumentSplitter |\n| Embedding Generation | Separate pipeline + caching | Computed columns |\n| Vector Search | External vector DB | Built-in vector indexing |\n| Prompt Management | Custom tracking solution | Version-controlled columns |\n| Chain Management | Tool + custom code | Computed column DAGs |\n\n### \ud83c\udfa8 Multimodal Workflows\n\n| Requirement | Traditional | Pixeltable |\n|-------------|---------------------|------------|\n| Data Types | Multiple storage systems | Unified table interface |\n| Cross-Modal Search | Complex integration | Native similarity support |\n| Pipeline Orchestration | Multiple tools (Airflow, etc.) | Single declarative interface |\n| Asset Management | Custom tracking system | Automatic lineage |\n| Quality Control | Multiple validation tools | Computed validation columns |\n\n## \u2753 FAQ\n\n### What is Pixeltable?\n\nPixeltable unifies data storage, versioning, and indexing with orchestration and model versioning under a declarative\ntable interface, with transformations, model inference, and custom logic represented as computed columns.\n\n### What problems does Pixeltable solve?\n\nToday's solutions for AI app development require extensive custom coding and infrastructure plumbing.\nTracking lineage and versions between and across data transformations, models, and deployments is cumbersome.\nPixeltable lets ML Engineers and Data Scientists focus on exploration, modeling, and app development without\ndealing with the customary data plumbing.\n\n### What does Pixeltable provide me with? Pixeltable provides:\n\n- Data storage and versioning\n- Combined Data and Model Lineage\n- Indexing (e.g. embedding vectors) and Data Retrieval\n- Orchestration of multimodal workloads\n- Incremental updates\n- Code is automatically production-ready\n\n### Why should you use Pixeltable?\n\n- **It gives you transparency and reproducibility**\n  - All generated data is automatically recorded and versioned\n  - You will never need to re-run a workload because you lost track of the input data\n- **It saves you money**\n  - All data changes are automatically incremental\n  - You never need to re-run pipelines from scratch because you\u2019re adding data\n- **It integrates with any existing Python code or libraries**\n  - Bring your ever-changing code and workloads\n  - You choose the models, tools, and AI practices (e.g., your embedding model for a vector index);\n    Pixeltable orchestrates the data\n\n### What is Pixeltable not providing?\n\n- Pixeltable is not a low-code, prescriptive AI solution. We empower you to use the best frameworks and techniques for\n  your specific needs.\n- We do not aim to replace your existing AI toolkit, but rather enhance it by streamlining the underlying data\n  infrastructure and orchestration.\n\n> [!TIP]\n> Check out the [Integrations](https://pixeltable.readme.io/docs/working-with-openai) section, and feel free to submit\n> a request for additional ones.\n\n## \ud83e\udd1d Contributing to Pixeltable\n\nWe're excited to welcome contributions from the community! Here's how you can get involved:\n\n### \ud83d\udc1b Report Issues\n\n- Found a bug? [Open an issue](https://github.com/pixeltable/pixeltable/issues)\n- Include steps to reproduce and environment details\n\n### \ud83d\udca1 Submit Changes\n\n- Fork the repository\n- Create a feature branch\n- Submit a [pull request](https://github.com/pixeltable/pixeltable/pulls)\n- See our [Contributing Guide](CONTRIBUTING.md) for detailed instructions\n\n### \ud83d\udcac Join the Discussion\n\n- Have questions? Start a [Discussion](https://github.com/pixeltable/pixeltable/discussions)\n- Share your Pixeltable projects and use cases\n- Help others in the community\n\n### \ud83d\udcdd Improve Documentation\n\n- Suggest examples and tutorials\n- Propose improvements\n\n## \ud83c\udfe2 License\n\nThis library is licensed under the Apache 2.0 License.\n",
    "bugtrack_url": null,
    "license": "Apache-2.0",
    "summary": "AI Data Infrastructure: Declarative, Multimodal, and Incremental",
    "version": "0.2.28",
    "project_urls": {
        "Documentation": "https://docs.pixeltable.com/",
        "Homepage": "https://pixeltable.com/",
        "Repository": "https://github.com/pixeltable/pixeltable"
    },
    "split_keywords": [
        "data-science",
        " machine-learning",
        " database",
        " ai",
        " computer-vision",
        " chatbot",
        " ml",
        " artificial-intelligence",
        " feature-engineering",
        " multimodal",
        " mlops",
        " feature-store",
        " vector-database",
        " llm",
        " genai"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "952db037756271b4b4635c4334be2e0bcf0674e25aa063ee8e07c01d45176716",
                "md5": "186b2158701cc005356ff42e4fafd6eb",
                "sha256": "59e869c7546953e438342384dde3473cb5571b276abc22c469da13192d62578d"
            },
            "downloads": -1,
            "filename": "pixeltable-0.2.28-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "186b2158701cc005356ff42e4fafd6eb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 335695,
            "upload_time": "2024-12-11T01:42:30",
            "upload_time_iso_8601": "2024-12-11T01:42:30.524104Z",
            "url": "https://files.pythonhosted.org/packages/95/2d/b037756271b4b4635c4334be2e0bcf0674e25aa063ee8e07c01d45176716/pixeltable-0.2.28-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "89305fb745df7a071a6d074257e02743388c92fc3da6599244307ab9629e165b",
                "md5": "726b58fb1c5498d6afba09ef13972ac5",
                "sha256": "5e358e8ba8a31590ccc36b2f5ab3d09d0511aef4a16270a2444b9e8f8163edf9"
            },
            "downloads": -1,
            "filename": "pixeltable-0.2.28.tar.gz",
            "has_sig": false,
            "md5_digest": "726b58fb1c5498d6afba09ef13972ac5",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 273291,
            "upload_time": "2024-12-11T01:42:33",
            "upload_time_iso_8601": "2024-12-11T01:42:33.223345Z",
            "url": "https://files.pythonhosted.org/packages/89/30/5fb745df7a071a6d074257e02743388c92fc3da6599244307ab9629e165b/pixeltable-0.2.28.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-11 01:42:33",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "pixeltable",
    "github_project": "pixeltable",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "pixeltable"
}
        
Elapsed time: 0.41544s