parxy


Nameparxy JSON
Version 0.10.0 PyPI version JSON
download
home_pageNone
SummaryParxy document processing gateway
upload_time2025-10-06 07:30:23
maintainerNone
docs_urlNone
authorAlessio Vertemati
requires_python>=3.12
licenseNone
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            ![pypi](https://img.shields.io/pypi/v/parxy.svg)
[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://docs.pydantic.dev/latest/contributing/#badges) [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv) [![CI](https://github.com/OneOffTech/parxy/actions/workflows/ci.yml/badge.svg)](https://github.com/OneOffTech/parxy/actions/workflows/ci.yml)

# OneOffTech Parxy

Parxy is a document processing gateway providing a unified interface to interact with multiple document parsing services, exposing a unified flexible document model suitable for different levels of text extraction granularity.

- Unified API to parse documents with different providers
- Unified flexible hierarchical document model (`page → block → line → span → character`)
- Supports both **local libraries** (e.g., PyMuPDF, Unstructured) and **remote services** (e.g., LlamaParse, LLMWhisperer, PdfAct)
- Extensible: easily integrate new parsers in your own code
- Trace the execution for debug purposes
- Pair with evaluation utilities to compare extraction results (coming soon)

> [!NOTE]  
> Parxy is being rewritten from the ground up. Versions 0.6 and below are preserved in the legacy branch for historical purposes. The main branch contains the rewrite, which focuses on library and CLI usage. If you still need the HTTP API, continue using version 0.6.

**Requirements**

- Python 3.12 or above (Python 3.10 and 3.11 are supported on best-effort).


**Next steps**

- [Getting started](#getting-started)
    - [The Parxy CLI](#use-on-the-command-line)
    - [Install the library in your application](#use-as-a-library-in-your-project)
- [Supported document processing services](#supported-services)
- [Personalize drivers](#live-extension)

## Getting started

Parxy is available as a standalone command line and a library. The quickest way to try out Parxy is via command line using [`uvx`](https://docs.astral.sh/uv/concepts/tools/#execution-vs-installation).


Use with minimal footprint (fewer drivers supported):

```bash
uvx parxy --help
```

Use all supported drivers:

```bash
uvx parxy[all] --help
```

See [Supported services](#supported-services) for the list of included drivers and their extras for the installation.

### Use on the command line

_to be documented_

### Use as a library in your project

_to be documented_

1. Install, all or the driver you need

2. Add the env variables when needed

3. Call the driver


```python
from parxy_core.facade import Parxy

# Using the default driver, usually pymupdf
Parxy.parse('path/to/document.pdf')

# Using a specific driver
Parxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')
```

## Supported services

| Service or Library | Support status | Extra | Local file | Remote file | 
|--------------------|----------------|-------|------------|-------------|
| [**PyMuPDF**](https://pymupdf.readthedocs.io/en/latest/) | Live | - | ✅ | ✅ |
| [**PdfAct**](https://github.com/data-house/pdfact) | Live | - | ✅ | ✅ |
| [**Unstructured** library](https://docs.unstructured.io/open-source/introduction/overview) | Preview | `unstructured_local` | ✅ | ✅ |
| [**LlamaParse**](https://docs.cloud.llamaindex.ai/llamaparse/overview) | Preview | `llama` | ✅ | ✅ |
| [**LLMWhisperer**](https://docs.unstract.com/llmwhisperer/index.html) | Preview | `llmwhisperer` | ✅ | ✅ |
| [**Unstructured.io** cloud service](https://docs.unstructured.io/open-source/introduction/overview) | Planned |  |  |  |
| [**Chunkr**](https://www.chunkr.ai/) | Planned |  |  |  |
| [**Docling**](https://docling-project.github.io/docling/) | Planned |  |  |  |


...and more can be added via the [live extension](#live-extension)!


### Live extension

Live Extension allow to add new drivers or create custom configuration of the current drivers directly in your app code.

1. Create a class that inherits from `Driver`

```python
from parxy_core.drivers import Driver
from parxy_core.models import Document

class CustomDriverExample(Driver):
    """Example custom driver for testing."""

    def _handle(self, file, level="page") -> Document:
        return Document(pages=[])
```

2. Register it in Parxy using the `extend` method

```python
Parxy.extend(name='my_parser', callback=lambda: CustomDriverExample())
```

3. Use it

```python
Parxy.driver('my_parser').parse('path/to/document.pdf')
```

## Contributing

Thank you for considering contributing to Parxy! You can find how to get started in our [contribution guide](./.github/CONTRIBUTING.md).

### Development

Parxy uses [UV](https://docs.astral.sh/uv/) as package and project manager. 

1. Clone the repository
1. Sync all dependencies with `uv sync --all-extras`

All Parxy code is located in the `src` directory:

- `parxy_core` contains the drivers implementations, the models and the facade and factory to access Parxy features
- `parxy_cli` contains the module providing the command line interface


#### Optional Dependencies vs Dependency Groups

Parxy uses _optional dependencies_ to track user oriented dependencies that enhance functionality. Dependency groups are reserved for development purposes. When supporting a new driver consider defining it's dependencies as optional to reduce Parxy's footprint.

The question [What’s the difference between optional-dependencies and dependency-groups in pyproject.toml?](https://github.com/astral-sh/uv/issues/9011) give a nice overview of the differences.

### Testing

Parxy is tested using Pytest. Tests, located under `tests` folder, run for each commit and pull request.

To execute the test suite run:

```bash
uv run pytest
```

You can run type checking and linting via:

```bash
uv run ruff check
```


## Security Vulnerabilities

Please review our [security policy](./.github/SECURITY.md) on how to report security vulnerabilities.


## Supporters

The project is provided and supported by OneOff-Tech (UG) and Alessio Vertemati.

<p align="left"><a href="https://oneofftech.de" target="_blank"><img src="https://raw.githubusercontent.com/OneOffTech/.github/main/art/oneofftech-logo.svg" width="200"></a></p>


## Licence and Copyright

Parxy is licensed under the [GPL v3 licence](./LICENCE).

- Copyright (c) 2025-present Alessio Vertemati, @avvertix
- Copyright (c) 2025-present Oneoff-tech UG, www.oneofftech.de
- All contributors

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "parxy",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.12",
    "maintainer_email": null,
    "keywords": null,
    "author": "Alessio Vertemati",
    "author_email": "Alessio Vertemati <alessio@oneofftech.xyz>",
    "download_url": "https://files.pythonhosted.org/packages/6f/74/28013f8e136874f28cc953073fe2d8e19ee444525088a6c8f06836bedf51/parxy-0.10.0.tar.gz",
    "platform": null,
    "description": "![pypi](https://img.shields.io/pypi/v/parxy.svg)\r\n[![Pydantic v2](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/pydantic/pydantic/main/docs/badge/v2.json)](https://docs.pydantic.dev/latest/contributing/#badges) [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv) [![CI](https://github.com/OneOffTech/parxy/actions/workflows/ci.yml/badge.svg)](https://github.com/OneOffTech/parxy/actions/workflows/ci.yml)\r\n\r\n# OneOffTech Parxy\r\n\r\nParxy is a document processing gateway providing a unified interface to interact with multiple document parsing services, exposing a unified flexible document model suitable for different levels of text extraction granularity.\r\n\r\n- Unified API to parse documents with different providers\r\n- Unified flexible hierarchical document model (`page \u2192 block \u2192 line \u2192 span \u2192 character`)\r\n- Supports both **local libraries** (e.g., PyMuPDF, Unstructured) and **remote services** (e.g., LlamaParse, LLMWhisperer, PdfAct)\r\n- Extensible: easily integrate new parsers in your own code\r\n- Trace the execution for debug purposes\r\n- Pair with evaluation utilities to compare extraction results (coming soon)\r\n\r\n> [!NOTE]  \r\n> Parxy is being rewritten from the ground up. Versions 0.6 and below are preserved in the legacy branch for historical purposes. The main branch contains the rewrite, which focuses on library and CLI usage. If you still need the HTTP API, continue using version 0.6.\r\n\r\n**Requirements**\r\n\r\n- Python 3.12 or above (Python 3.10 and 3.11 are supported on best-effort).\r\n\r\n\r\n**Next steps**\r\n\r\n- [Getting started](#getting-started)\r\n    - [The Parxy CLI](#use-on-the-command-line)\r\n    - [Install the library in your application](#use-as-a-library-in-your-project)\r\n- [Supported document processing services](#supported-services)\r\n- [Personalize drivers](#live-extension)\r\n\r\n## Getting started\r\n\r\nParxy is available as a standalone command line and a library. The quickest way to try out Parxy is via command line using [`uvx`](https://docs.astral.sh/uv/concepts/tools/#execution-vs-installation).\r\n\r\n\r\nUse with minimal footprint (fewer drivers supported):\r\n\r\n```bash\r\nuvx parxy --help\r\n```\r\n\r\nUse all supported drivers:\r\n\r\n```bash\r\nuvx parxy[all] --help\r\n```\r\n\r\nSee [Supported services](#supported-services) for the list of included drivers and their extras for the installation.\r\n\r\n### Use on the command line\r\n\r\n_to be documented_\r\n\r\n### Use as a library in your project\r\n\r\n_to be documented_\r\n\r\n1. Install, all or the driver you need\r\n\r\n2. Add the env variables when needed\r\n\r\n3. Call the driver\r\n\r\n\r\n```python\r\nfrom parxy_core.facade import Parxy\r\n\r\n# Using the default driver, usually pymupdf\r\nParxy.parse('path/to/document.pdf')\r\n\r\n# Using a specific driver\r\nParxy.driver(Parxy.LLAMAPARSE).parse('path/to/document.pdf')\r\n```\r\n\r\n## Supported services\r\n\r\n| Service or Library | Support status | Extra | Local file | Remote file | \r\n|--------------------|----------------|-------|------------|-------------|\r\n| [**PyMuPDF**](https://pymupdf.readthedocs.io/en/latest/) | Live | - | \u2705 | \u2705 |\r\n| [**PdfAct**](https://github.com/data-house/pdfact) | Live | - | \u2705 | \u2705 |\r\n| [**Unstructured** library](https://docs.unstructured.io/open-source/introduction/overview) | Preview | `unstructured_local` | \u2705 | \u2705 |\r\n| [**LlamaParse**](https://docs.cloud.llamaindex.ai/llamaparse/overview) | Preview | `llama` | \u2705 | \u2705 |\r\n| [**LLMWhisperer**](https://docs.unstract.com/llmwhisperer/index.html) | Preview | `llmwhisperer` | \u2705 | \u2705 |\r\n| [**Unstructured.io** cloud service](https://docs.unstructured.io/open-source/introduction/overview) | Planned |  |  |  |\r\n| [**Chunkr**](https://www.chunkr.ai/) | Planned |  |  |  |\r\n| [**Docling**](https://docling-project.github.io/docling/) | Planned |  |  |  |\r\n\r\n\r\n...and more can be added via the [live extension](#live-extension)!\r\n\r\n\r\n### Live extension\r\n\r\nLive Extension allow to add new drivers or create custom configuration of the current drivers directly in your app code.\r\n\r\n1. Create a class that inherits from `Driver`\r\n\r\n```python\r\nfrom parxy_core.drivers import Driver\r\nfrom parxy_core.models import Document\r\n\r\nclass CustomDriverExample(Driver):\r\n    \"\"\"Example custom driver for testing.\"\"\"\r\n\r\n    def _handle(self, file, level=\"page\") -> Document:\r\n        return Document(pages=[])\r\n```\r\n\r\n2. Register it in Parxy using the `extend` method\r\n\r\n```python\r\nParxy.extend(name='my_parser', callback=lambda: CustomDriverExample())\r\n```\r\n\r\n3. Use it\r\n\r\n```python\r\nParxy.driver('my_parser').parse('path/to/document.pdf')\r\n```\r\n\r\n## Contributing\r\n\r\nThank you for considering contributing to Parxy! You can find how to get started in our [contribution guide](./.github/CONTRIBUTING.md).\r\n\r\n### Development\r\n\r\nParxy uses [UV](https://docs.astral.sh/uv/) as package and project manager. \r\n\r\n1. Clone the repository\r\n1. Sync all dependencies with `uv sync --all-extras`\r\n\r\nAll Parxy code is located in the `src` directory:\r\n\r\n- `parxy_core` contains the drivers implementations, the models and the facade and factory to access Parxy features\r\n- `parxy_cli` contains the module providing the command line interface\r\n\r\n\r\n#### Optional Dependencies vs Dependency Groups\r\n\r\nParxy uses _optional dependencies_ to track user oriented dependencies that enhance functionality. Dependency groups are reserved for development purposes. When supporting a new driver consider defining it's dependencies as optional to reduce Parxy's footprint.\r\n\r\nThe question [What\u2019s the difference between optional-dependencies and dependency-groups in pyproject.toml?](https://github.com/astral-sh/uv/issues/9011) give a nice overview of the differences.\r\n\r\n### Testing\r\n\r\nParxy is tested using Pytest. Tests, located under `tests` folder, run for each commit and pull request.\r\n\r\nTo execute the test suite run:\r\n\r\n```bash\r\nuv run pytest\r\n```\r\n\r\nYou can run type checking and linting via:\r\n\r\n```bash\r\nuv run ruff check\r\n```\r\n\r\n\r\n## Security Vulnerabilities\r\n\r\nPlease review our [security policy](./.github/SECURITY.md) on how to report security vulnerabilities.\r\n\r\n\r\n## Supporters\r\n\r\nThe project is provided and supported by OneOff-Tech (UG) and Alessio Vertemati.\r\n\r\n<p align=\"left\"><a href=\"https://oneofftech.de\" target=\"_blank\"><img src=\"https://raw.githubusercontent.com/OneOffTech/.github/main/art/oneofftech-logo.svg\" width=\"200\"></a></p>\r\n\r\n\r\n## Licence and Copyright\r\n\r\nParxy is licensed under the [GPL v3 licence](./LICENCE).\r\n\r\n- Copyright (c) 2025-present Alessio Vertemati, @avvertix\r\n- Copyright (c) 2025-present Oneoff-tech UG, www.oneofftech.de\r\n- All contributors\r\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Parxy document processing gateway",
    "version": "0.10.0",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2d4f497fb83d9e60f410920f5c81f995c7011de3998b8b34d37556d7a3c644ea",
                "md5": "fd322d1b778359bc07f8910ff72a759f",
                "sha256": "62a847665afb9df86095d67988bf0389fbfccb1fa36c0a5d508593c4651a594a"
            },
            "downloads": -1,
            "filename": "parxy-0.10.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "fd322d1b778359bc07f8910ff72a759f",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.12",
            "size": 36942,
            "upload_time": "2025-10-06T07:30:22",
            "upload_time_iso_8601": "2025-10-06T07:30:22.091121Z",
            "url": "https://files.pythonhosted.org/packages/2d/4f/497fb83d9e60f410920f5c81f995c7011de3998b8b34d37556d7a3c644ea/parxy-0.10.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "6f7428013f8e136874f28cc953073fe2d8e19ee444525088a6c8f06836bedf51",
                "md5": "4f04ba09384f569103e3f87daa0024df",
                "sha256": "4c40c5bc4661863c4ce138a094f9aaf77ebd2b21c5c9a66d6ec4d3b00bd9e95c"
            },
            "downloads": -1,
            "filename": "parxy-0.10.0.tar.gz",
            "has_sig": false,
            "md5_digest": "4f04ba09384f569103e3f87daa0024df",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.12",
            "size": 23824,
            "upload_time": "2025-10-06T07:30:23",
            "upload_time_iso_8601": "2025-10-06T07:30:23.251882Z",
            "url": "https://files.pythonhosted.org/packages/6f/74/28013f8e136874f28cc953073fe2d8e19ee444525088a6c8f06836bedf51/parxy-0.10.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-06 07:30:23",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "parxy"
}
        
Elapsed time: 1.67945s