long2short

Name	long2short JSON
Version	0.1.3 JSON
	download
home_page	None
Summary	A flexible text summarization library to summarize long documents supporting multiple LLM providers
upload_time	2025-01-23 11:13:46
maintainer	None
docs_url	None
author	None
requires_python	>=3.8
license	None
keywords	summarization text-processing llms openai anthropic
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # long2short 📜

**long2short** is a flexible Python library for long document text summarization that supports multiple Language Model (LLM) providers. It allows you to summarize long documents with fine-grained control over the level of detail. With an extensible architecture, it’s easy to integrate with various LLMs and customize its behavior.

---

## Features

- **Multi-LLM Support**: Compatible with OpenAI, Anthropic, and custom LLM providers.
- **Detail Control**: Adjust the level of detail in the summary with a simple parameter.
- **Smart Chunking**: Automatically splits and processes large texts based on token limits.
- **Recursive Summarization**: Uses previous summaries as context for summarizing subsequent sections.
- **Custom Instructions**: Add domain-specific instructions for tailored summarization.
- **Progress Tracking**: Visualize progress with `tqdm`.
- **Extensible Design**: Add new LLM providers or customize existing ones with ease.

---

## Installation

Install the library using pip:

```bash
pip install long2short
```

---

## Quick Start

Here’s how to get started with **long2short** using OpenAI as the LLM provider:

```python
from long2short import Long2Short, OpenAIProvider

# Initialize the provider
provider = OpenAIProvider(api_key="your-api-key")
summarizer = Long2Short(provider)

# Summarize text
text = "Your long text here..."
summary = summarizer.summarize(text, detail=0.5)
print(summary)
```

---

## Using Different Providers

### OpenAI

To use OpenAI’s GPT models:

```python
from long2short import Long2Short, OpenAIProvider

provider = OpenAIProvider(
    api_key="your-openai-api-key",
    model="gpt-4-turbo"  # Specify your preferred model
)
summarizer = Long2Short(provider)
```

### Anthropic (Claude)

To use Anthropic’s Claude models:

```python
from long2short import Long2Short, AnthropicProvider

provider = AnthropicProvider(
    api_key="your-anthropic-api-key",
    model="claude-3-opus-20240229"  # Specify your preferred model
)
summarizer = Long2Short(provider)
```

---

## Controlling Summary Detail

The `detail` parameter allows you to adjust how detailed the summary should be:

```python
# Generate a brief, high-level summary
brief_summary = summarizer.summarize(text, detail=0)

# Generate a detailed, in-depth summary
detailed_summary = summarizer.summarize(text, detail=1)
```

---

## Advanced Features

### Recursive Summarization

Enable recursive summarization to use previous summaries as context for generating new ones:

```python
summary = summarizer.summarize(
    text,
    detail=0.5,
    summarize_recursively=True
)
```

### Custom Instructions

Tailor the summary with additional instructions:

```python
summary = summarizer.summarize(
    text,
    detail=0.5,
    additional_instructions="Focus on numerical data and statistics."
)
```

### Smart Text Chunking

Large texts are automatically split into manageable chunks based on token limits, ensuring efficient processing. You can control:

- Minimum chunk size (`minimum_chunk_size`)
- Chunk delimiters (`chunk_delimiter`)
- Headers for each chunk (`header`)

Example:

```python
summary = summarizer.summarize(
    text,
    detail=0.7,
    minimum_chunk_size=500,
    chunk_delimiter=".",
    header="Section Summary"
)
```

### Verbose Output

Enable detailed logging to track the summarization process:

```python
summary = summarizer.summarize(
    text,
    detail=0.5,
    verbose=True
)
```

### Handling Dropped Chunks

The library ensures that excessively large chunks are skipped, and any dropped chunks are logged (if verbose mode is enabled). This prevents token overflow issues while maintaining efficient processing.

---

## Creating Custom Providers

You can implement custom LLM providers by extending the `LLMProvider` abstract base class:

```python
from long2short import LLMProvider

class CustomProvider(LLMProvider):
    def __init__(self, **kwargs):
        # Initialize your provider
        pass

    def generate_completion(self, messages: list, **kwargs) -> str:
        # Implement completion generation logic
        return "Custom completion response"
```

Integrate the custom provider into Long2Short:

```python
custom_provider = CustomProvider()
summarizer = Long2Short(custom_provider)
```

---

## Progress Tracking

The summarization process supports `tqdm` for real-time progress tracking:

```python
summary = summarizer.summarize(
    text,
    detail=0.5,
    verbose=True
)
```

---

## Extensibility

### Adding New Features
- Extend functionality by overriding or extending the `Long2Short` class.
- Customize tokenization or chunking behavior by modifying `Tokenizer` or `TextChunker` classes.

---

## Contributing

Contributions are welcome! Whether it’s reporting a bug, suggesting new features, or submitting a pull request, your help is appreciated.

To contribute:
1. Fork the repository.
2. Create a feature branch.
3. Submit a pull request.

---

## Example Usage

```python
from long2short import Long2Short, OpenAIProvider

# Initialize with OpenAI
provider = OpenAIProvider(api_key="your-api-key")
summarizer = Long2Short(provider)

# Summarize with custom instructions
text = "Your long document here..."
summary = summarizer.summarize(
    text,
    detail=0.8,
    additional_instructions="Focus on the key takeaways and technical details."
)

print("Summary:")
print(summary)
```

### Attribution
This project heavily references code and ideas from the [OpenAI Cookbook](https://cookbook.openai.com/examples/summarizing_long_documents).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "long2short",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "summarization, text-processing, llms, openai, anthropic",
    "author": null,
    "author_email": "Mohammed Arshad <mdarshad1000@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/2e/77/fa2d8d252f489cbcb7c2ba1e7c0594ac44f4d07e5bcce233b8b8a67b8a1a/long2short-0.1.3.tar.gz",
    "platform": null,
    "description": "# long2short \ud83d\udcdc\n\n**long2short** is a flexible Python library for long document text summarization that supports multiple Language Model (LLM) providers. It allows you to summarize long documents with fine-grained control over the level of detail. With an extensible architecture, it\u2019s easy to integrate with various LLMs and customize its behavior.\n\n---\n\n## Features\n\n- **Multi-LLM Support**: Compatible with OpenAI, Anthropic, and custom LLM providers.\n- **Detail Control**: Adjust the level of detail in the summary with a simple parameter.\n- **Smart Chunking**: Automatically splits and processes large texts based on token limits.\n- **Recursive Summarization**: Uses previous summaries as context for summarizing subsequent sections.\n- **Custom Instructions**: Add domain-specific instructions for tailored summarization.\n- **Progress Tracking**: Visualize progress with `tqdm`.\n- **Extensible Design**: Add new LLM providers or customize existing ones with ease.\n\n---\n\n## Installation\n\nInstall the library using pip:\n\n```bash\npip install long2short\n```\n\n---\n\n## Quick Start\n\nHere\u2019s how to get started with **long2short** using OpenAI as the LLM provider:\n\n```python\nfrom long2short import Long2Short, OpenAIProvider\n\n# Initialize the provider\nprovider = OpenAIProvider(api_key=\"your-api-key\")\nsummarizer = Long2Short(provider)\n\n# Summarize text\ntext = \"Your long text here...\"\nsummary = summarizer.summarize(text, detail=0.5)\nprint(summary)\n```\n\n---\n\n## Using Different Providers\n\n### OpenAI\n\nTo use OpenAI\u2019s GPT models:\n\n```python\nfrom long2short import Long2Short, OpenAIProvider\n\nprovider = OpenAIProvider(\n    api_key=\"your-openai-api-key\",\n    model=\"gpt-4-turbo\"  # Specify your preferred model\n)\nsummarizer = Long2Short(provider)\n```\n\n### Anthropic (Claude)\n\nTo use Anthropic\u2019s Claude models:\n\n```python\nfrom long2short import Long2Short, AnthropicProvider\n\nprovider = AnthropicProvider(\n    api_key=\"your-anthropic-api-key\",\n    model=\"claude-3-opus-20240229\"  # Specify your preferred model\n)\nsummarizer = Long2Short(provider)\n```\n\n---\n\n## Controlling Summary Detail\n\nThe `detail` parameter allows you to adjust how detailed the summary should be:\n\n```python\n# Generate a brief, high-level summary\nbrief_summary = summarizer.summarize(text, detail=0)\n\n# Generate a detailed, in-depth summary\ndetailed_summary = summarizer.summarize(text, detail=1)\n```\n\n---\n\n## Advanced Features\n\n### Recursive Summarization\n\nEnable recursive summarization to use previous summaries as context for generating new ones:\n\n```python\nsummary = summarizer.summarize(\n    text,\n    detail=0.5,\n    summarize_recursively=True\n)\n```\n\n### Custom Instructions\n\nTailor the summary with additional instructions:\n\n```python\nsummary = summarizer.summarize(\n    text,\n    detail=0.5,\n    additional_instructions=\"Focus on numerical data and statistics.\"\n)\n```\n\n### Smart Text Chunking\n\nLarge texts are automatically split into manageable chunks based on token limits, ensuring efficient processing. You can control:\n\n- Minimum chunk size (`minimum_chunk_size`)\n- Chunk delimiters (`chunk_delimiter`)\n- Headers for each chunk (`header`)\n\nExample:\n\n```python\nsummary = summarizer.summarize(\n    text,\n    detail=0.7,\n    minimum_chunk_size=500,\n    chunk_delimiter=\".\",\n    header=\"Section Summary\"\n)\n```\n\n### Verbose Output\n\nEnable detailed logging to track the summarization process:\n\n```python\nsummary = summarizer.summarize(\n    text,\n    detail=0.5,\n    verbose=True\n)\n```\n\n### Handling Dropped Chunks\n\nThe library ensures that excessively large chunks are skipped, and any dropped chunks are logged (if verbose mode is enabled). This prevents token overflow issues while maintaining efficient processing.\n\n---\n\n## Creating Custom Providers\n\nYou can implement custom LLM providers by extending the `LLMProvider` abstract base class:\n\n```python\nfrom long2short import LLMProvider\n\nclass CustomProvider(LLMProvider):\n    def __init__(self, **kwargs):\n        # Initialize your provider\n        pass\n\n    def generate_completion(self, messages: list, **kwargs) -> str:\n        # Implement completion generation logic\n        return \"Custom completion response\"\n```\n\nIntegrate the custom provider into Long2Short:\n\n```python\ncustom_provider = CustomProvider()\nsummarizer = Long2Short(custom_provider)\n```\n\n---\n\n## Progress Tracking\n\nThe summarization process supports `tqdm` for real-time progress tracking:\n\n```python\nsummary = summarizer.summarize(\n    text,\n    detail=0.5,\n    verbose=True\n)\n```\n\n---\n\n## Extensibility\n\n### Adding New Features\n- Extend functionality by overriding or extending the `Long2Short` class.\n- Customize tokenization or chunking behavior by modifying `Tokenizer` or `TextChunker` classes.\n\n---\n\n## Contributing\n\nContributions are welcome! Whether it\u2019s reporting a bug, suggesting new features, or submitting a pull request, your help is appreciated.\n\nTo contribute:\n1. Fork the repository.\n2. Create a feature branch.\n3. Submit a pull request.\n\n---\n\n## Example Usage\n\n```python\nfrom long2short import Long2Short, OpenAIProvider\n\n# Initialize with OpenAI\nprovider = OpenAIProvider(api_key=\"your-api-key\")\nsummarizer = Long2Short(provider)\n\n# Summarize with custom instructions\ntext = \"Your long document here...\"\nsummary = summarizer.summarize(\n    text,\n    detail=0.8,\n    additional_instructions=\"Focus on the key takeaways and technical details.\"\n)\n\nprint(\"Summary:\")\nprint(summary)\n```\n\n### Attribution\nThis project heavily references code and ideas from the [OpenAI Cookbook](https://cookbook.openai.com/examples/summarizing_long_documents).\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "A flexible text summarization library to summarize long documents supporting multiple LLM providers",
    "version": "0.1.3",
    "project_urls": null,
    "split_keywords": [
        "summarization",
        " text-processing",
        " llms",
        " openai",
        " anthropic"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0021e4f6808a18fc82b16aa357fd0c013b72f2a372ac3f735fba0a2f04ae4c5d",
                "md5": "ff17ecfab2a8108fcb6cf829e071bcfc",
                "sha256": "ace988b2e24a510db0c14e52a3b3b88c1c0a1575b38da2753d941ae57aaf8174"
            },
            "downloads": -1,
            "filename": "long2short-0.1.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "ff17ecfab2a8108fcb6cf829e071bcfc",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 7511,
            "upload_time": "2025-01-23T11:13:42",
            "upload_time_iso_8601": "2025-01-23T11:13:42.035061Z",
            "url": "https://files.pythonhosted.org/packages/00/21/e4f6808a18fc82b16aa357fd0c013b72f2a372ac3f735fba0a2f04ae4c5d/long2short-0.1.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2e77fa2d8d252f489cbcb7c2ba1e7c0594ac44f4d07e5bcce233b8b8a67b8a1a",
                "md5": "e8594cd8e07d53aeca6bf6d2acef1921",
                "sha256": "9b08b0abc0031eef0760d2edf276829a565fc7593b3b533ad02d29577fa2827b"
            },
            "downloads": -1,
            "filename": "long2short-0.1.3.tar.gz",
            "has_sig": false,
            "md5_digest": "e8594cd8e07d53aeca6bf6d2acef1921",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 6621,
            "upload_time": "2025-01-23T11:13:46",
            "upload_time_iso_8601": "2025-01-23T11:13:46.091738Z",
            "url": "https://files.pythonhosted.org/packages/2e/77/fa2d8d252f489cbcb7c2ba1e7c0594ac44f4d07e5bcce233b8b8a67b8a1a/long2short-0.1.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-23 11:13:46",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "long2short"
}

None