langchain-progress


Namelangchain-progress JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/wrmthorne/langchain-progress
SummaryWrapper for nicely displaying progress bars for langchain embedding components when using multiprocessing or ray.
upload_time2024-03-19 13:04:00
maintainer
docs_urlNone
authorWilliam Thorne
requires_python>=3.9,<4.0
licenseMIT
keywords langchain progress ray wrapper langchain_community multiprocessing tqdm
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Langchain Progress

A module that adds a context manager to wrap lanchain embedding elements to better handle progress bars. This is particularly useful when using ray or multiprocessing to use a single progress bar across all remotes/processes


## Installing

The library can be installed using PyPI:

```bash
pip install langchain-progress
```

If you only need a subset of the library's features, you can install dependencies for your chosen setup:

```bash
pip install langchain-progress[tqdm]
pip install langchain-progress[ray]
```

## How to Use

This context manager can be used in a single-process or across a distributed process such as ray to display the process of generating embeddings using langchain. The ProgressManager context manager requires that a langchain embedding object be provided and optionally accepts a progress bar. If no progress bar is provided, a new progress bar will be created using tqdm. An important note is that if using `show_progress=True` when instantiating an embeddings object, any internal progress bar created within that class will be replaced with one from langchain-progress.

The following is a simple example of passing an existing progress bar and depending on the automatically generated progress bar.

```python
from langchain_progress import ProgressManager

with ProgressManager(embeddings):
    result = FAISS.from_documents(docs, embeddings)

with ProgressManager(embeddings, pbar):
    result = FAISS.from_documents(docs, embeddings)
```

### Ray Example

The real use-case for this context manager is when using ray or multiprocessing to improve embedding speed. If `show_progress=True` is enabled for embeddings objects, a new  progress bar is created for each process. This causes fighting while drawing each individual progress bar, causing the progress bar to be redrawn for each update on each process. This approach also doesn't allow us to report to a single progress bar across all remotes for a unified indication of progress. Using the `ProgressManager` context manager we can solve these problems. We can also use the `RayPBar` context manager to simplify the setup and passing of ray progress bars. The following is the recommended way to create progress bars using ray:

```python
from ray.experimental import tqdm_ray

from langchain_progress import RayPBar

@ray.remote(num_gpus=1)
def process_shard(shard, pbar):
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

    with ProgressManager(embeddings, pbar):
        result = FAISS.from_documents(shard, embeddings)

    return result

doc_shards = np.array_split(docs, num_shards)

with RayPBar(total=len(docs)) as pbar:
    vectors = ray.get([process_shard.remote(shard, pbar) for shard in doc_shards])

pbar.close.remote()
```

A full example can be found in `./examples/ray_example.py`.

### Multiprocessing Example

To simplify implementing progress bars with multiprocessing, the `MultiprocessingPBar` context manager handles the creation and updating of the shared progress bar processes. The following is the recommended way to create progress bars using multiprocessing:

```python
from langchain_progress import MultiprocessingPBarManager

def process_shard(shard, pbar):
    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

    with ProgressManager(embeddings, pbar):
        result = FAISS.from_documents(shard, embeddings)

    return result

doc_shards = np.array_split(docs, num_shards)

with MultiprocessingPBar(total=len(docs)) as pbar, Pool(num_shards) as pool:
    vectors = pool.starmap(process_shard, [(shard, pbar) for shard in doc_shards])
```

A full example can be found in `./examples/multiprocessing_example.py`.

## Tests

To run the test suite, you can run the following command from the root directory. Tests will be skipped if the required optional libraries are not installed:

```bash
python -m unittest
```

## Limitations

This wrapper cannot create progress bars for any API based embedding tool such as `HuggingFaceInferenceAPIEmbeddings` as it relies on wrapping the texts supplied to the embeddings method. This obviously can't be done when querying a remote API. This module also doesn't currently support all of langchain's embedding classes. If your embedding class isn't yet supported, please open an issue and I'll take a look when I get time.


            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/wrmthorne/langchain-progress",
    "name": "langchain-progress",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "Langchain,progress,ray,wrapper,langchain_community,multiprocessing,tqdm",
    "author": "William Thorne",
    "author_email": "wthorne1@sheffield.ac.uk",
    "download_url": "https://files.pythonhosted.org/packages/e2/77/c28126edc8074077570a1f4f3d478bbbefc0d22f50fdae8640d03c6e45c3/langchain_progress-0.1.1.tar.gz",
    "platform": null,
    "description": "# Langchain Progress\n\nA module that adds a context manager to wrap lanchain embedding elements to better handle progress bars. This is particularly useful when using ray or multiprocessing to use a single progress bar across all remotes/processes\n\n\n## Installing\n\nThe library can be installed using PyPI:\n\n```bash\npip install langchain-progress\n```\n\nIf you only need a subset of the library's features, you can install dependencies for your chosen setup:\n\n```bash\npip install langchain-progress[tqdm]\npip install langchain-progress[ray]\n```\n\n## How to Use\n\nThis context manager can be used in a single-process or across a distributed process such as ray to display the process of generating embeddings using langchain. The ProgressManager context manager requires that a langchain embedding object be provided and optionally accepts a progress bar. If no progress bar is provided, a new progress bar will be created using tqdm. An important note is that if using `show_progress=True` when instantiating an embeddings object, any internal progress bar created within that class will be replaced with one from langchain-progress.\n\nThe following is a simple example of passing an existing progress bar and depending on the automatically generated progress bar.\n\n```python\nfrom langchain_progress import ProgressManager\n\nwith ProgressManager(embeddings):\n    result = FAISS.from_documents(docs, embeddings)\n\nwith ProgressManager(embeddings, pbar):\n    result = FAISS.from_documents(docs, embeddings)\n```\n\n### Ray Example\n\nThe real use-case for this context manager is when using ray or multiprocessing to improve embedding speed. If `show_progress=True` is enabled for embeddings objects, a new  progress bar is created for each process. This causes fighting while drawing each individual progress bar, causing the progress bar to be redrawn for each update on each process. This approach also doesn't allow us to report to a single progress bar across all remotes for a unified indication of progress. Using the `ProgressManager` context manager we can solve these problems. We can also use the `RayPBar` context manager to simplify the setup and passing of ray progress bars. The following is the recommended way to create progress bars using ray:\n\n```python\nfrom ray.experimental import tqdm_ray\n\nfrom langchain_progress import RayPBar\n\n@ray.remote(num_gpus=1)\ndef process_shard(shard, pbar):\n    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')\n\n    with ProgressManager(embeddings, pbar):\n        result = FAISS.from_documents(shard, embeddings)\n\n    return result\n\ndoc_shards = np.array_split(docs, num_shards)\n\nwith RayPBar(total=len(docs)) as pbar:\n    vectors = ray.get([process_shard.remote(shard, pbar) for shard in doc_shards])\n\npbar.close.remote()\n```\n\nA full example can be found in `./examples/ray_example.py`.\n\n### Multiprocessing Example\n\nTo simplify implementing progress bars with multiprocessing, the `MultiprocessingPBar` context manager handles the creation and updating of the shared progress bar processes. The following is the recommended way to create progress bars using multiprocessing:\n\n```python\nfrom langchain_progress import MultiprocessingPBarManager\n\ndef process_shard(shard, pbar):\n    embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')\n\n    with ProgressManager(embeddings, pbar):\n        result = FAISS.from_documents(shard, embeddings)\n\n    return result\n\ndoc_shards = np.array_split(docs, num_shards)\n\nwith MultiprocessingPBar(total=len(docs)) as pbar, Pool(num_shards) as pool:\n    vectors = pool.starmap(process_shard, [(shard, pbar) for shard in doc_shards])\n```\n\nA full example can be found in `./examples/multiprocessing_example.py`.\n\n## Tests\n\nTo run the test suite, you can run the following command from the root directory. Tests will be skipped if the required optional libraries are not installed:\n\n```bash\npython -m unittest\n```\n\n## Limitations\n\nThis wrapper cannot create progress bars for any API based embedding tool such as `HuggingFaceInferenceAPIEmbeddings` as it relies on wrapping the texts supplied to the embeddings method. This obviously can't be done when querying a remote API. This module also doesn't currently support all of langchain's embedding classes. If your embedding class isn't yet supported, please open an issue and I'll take a look when I get time.\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Wrapper for nicely displaying progress bars for langchain embedding components when using multiprocessing or ray.",
    "version": "0.1.1",
    "project_urls": {
        "Homepage": "https://github.com/wrmthorne/langchain-progress",
        "Repository": "https://github.com/wrmthorne/langchain-progress"
    },
    "split_keywords": [
        "langchain",
        "progress",
        "ray",
        "wrapper",
        "langchain_community",
        "multiprocessing",
        "tqdm"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fb845db0cbf01ea97ea408d0609728664b64ed2fd91471c96769f1818e7a56c3",
                "md5": "bd67ac1589fcaabea6d83d8769441daf",
                "sha256": "45f3c03474a191a2c94c7fbd7ef4cf638538c8507a81455621108f2a707cbae7"
            },
            "downloads": -1,
            "filename": "langchain_progress-0.1.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bd67ac1589fcaabea6d83d8769441daf",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 9678,
            "upload_time": "2024-03-19T13:03:58",
            "upload_time_iso_8601": "2024-03-19T13:03:58.973533Z",
            "url": "https://files.pythonhosted.org/packages/fb/84/5db0cbf01ea97ea408d0609728664b64ed2fd91471c96769f1818e7a56c3/langchain_progress-0.1.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e277c28126edc8074077570a1f4f3d478bbbefc0d22f50fdae8640d03c6e45c3",
                "md5": "6f3e6554016f8f7b8846bd60eed1914f",
                "sha256": "4fcd5a7317c3d60610bde9b9a8a050846e4a646c0d175e6a3a0a4dfbeafc6b5a"
            },
            "downloads": -1,
            "filename": "langchain_progress-0.1.1.tar.gz",
            "has_sig": false,
            "md5_digest": "6f3e6554016f8f7b8846bd60eed1914f",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 8282,
            "upload_time": "2024-03-19T13:04:00",
            "upload_time_iso_8601": "2024-03-19T13:04:00.050003Z",
            "url": "https://files.pythonhosted.org/packages/e2/77/c28126edc8074077570a1f4f3d478bbbefc0d22f50fdae8640d03c6e45c3/langchain_progress-0.1.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-03-19 13:04:00",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "wrmthorne",
    "github_project": "langchain-progress",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "langchain-progress"
}
        
Elapsed time: 0.20871s