dataframes-haystack


Namedataframes-haystack JSON
Version 0.0.2 PyPI version JSON
download
home_pageNone
SummaryHaystack custom components for your favourite dataframe library.
upload_time2024-07-28 21:50:56
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseMIT License Copyright (c) 2024-present Edoardo Abati <29585319+EdAbati@users.noreply.github.com> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
keywords ai dataframe haystack llm machine-learning nlp pandas polars
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Dataframes Haystack

[![PyPI - Version](https://img.shields.io/pypi/v/dataframes-haystack)](https://pypi.org/project/dataframes-haystack)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dataframes-haystack?logo=python&logoColor=white)](https://pypi.org/project/dataframes-haystack)
[![PyPI - License](https://img.shields.io/pypi/l/dataframes-haystack)](https://pypi.org/project/dataframes-haystack)


[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

[![GH Actions Tests](https://github.com/EdAbati/dataframes-haystack/actions/workflows/test.yml/badge.svg)](https://github.com/EdAbati/dataframes-haystack/actions/workflows/test.yml)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/EdAbati/dataframes-haystack/main.svg)](https://results.pre-commit.ci/latest/github/EdAbati/dataframes-haystack/main)

-----

## 📃 Description

`dataframes-haystack` is an extension for [Haystack 2](https://docs.haystack.deepset.ai/docs/intro) that enables integration with dataframe libraries.

The dataframe libraries currently supported are:
- [pandas](https://pandas.pydata.org/)
- [Polars](https://pola.rs)

The library offers various custom [Converters](https://docs.haystack.deepset.ai/docs/converters) components to transform dataframes into Haystack [`Document`](https://docs.haystack.deepset.ai/docs/data-classes#document) objects:
- `FileToPandasDataFrame` and `FileToPolarsDataFrame` read files and convert them into dataframes.
- `PandasDataFrameConverter` or `PolarsDataFrameConverter` convert data stored in dataframes into Haystack `Document`objects.

## 🛠️ Installation

```sh
# for pandas (pandas is already included in `haystack-ai`)
pip install dataframes-haystack

# for polars
pip install "dataframes-haystack[polars]"
```

## 💻 Usage

> [!TIP]
> See the [Example Notebooks](./notebooks) for complete examples.

### Pandas

#### FileToPandasDataFrame

```python
from dataframes_haystack.components.converters.pandas import FileToPandasDataFrame

converter = FileToPandasDataFrame(file_format="csv")

output_dataframe = converter.run(
    file_paths=["data/doc1.csv", "data/doc2.csv"]
)
```

Result:
```python
>>> output_dataframe
{'dataframe': <pandas.DataFrame>}
```

#### PandasDataFrameConverter

```python
import pandas as pd

from dataframes_haystack.components.converters.pandas import PandasDataFrameConverter

df = pd.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PandasDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
```

Result:
```python
>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
```

### Polars

#### FileToPolarsDataFrame

```python
from dataframes_haystack.components.converters.polars import FileToPolarsDataFrame

converter = FileToPolarsDataFrame(file_format="csv")

output_dataframe = converter.run(
    file_paths=["data/doc1.csv", "data/doc2.csv"]
)
```

Result:
```python
>>> output_dataframe
{'dataframe': <polars.DataFrame>}
```

#### PolarsDataFrameConverter

```python
import polars as pl

from dataframes_haystack.components.converters.polars import PolarsDataFrameConverter

df = pl.DataFrame({
    "text": ["Hello world", "Hello everyone"],
    "filename": ["doc1.txt", "doc2.txt"],
})

converter = PolarsDataFrameConverter(content_column="text", meta_columns=["filename"])
documents = converter.run(df)
```

Result:
```python
>>> documents
{'documents': [
    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),
    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})
]}
```

## 🤝 Contributing

Do you have an idea for a new feature? Did you find a bug that needs fixing?

Feel free to [open an issue](https://github.com/EdAbati/dataframes-haystack/issues) or submit a PR!

### Setup development environment

Requirements: [`hatch`](https://hatch.pypa.io/latest/install/), [`pre-commit`](https://pre-commit.com/#install)

1. Clone the repository
1. Run `hatch shell` to create and activate a virtual environment
1. Run `pre-commit install` to install the pre-commit hooks. This will force the linting and formatting checks.

### Run tests

- Linting and formatting checks: `hatch run lint:fmt`
- Unit tests: `hatch run test-cov-all`

## ✍️ License

`dataframes-haystack` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "dataframes-haystack",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "ai, dataframe, haystack, llm, machine-learning, nlp, pandas, polars",
    "author": null,
    "author_email": "Edoardo Abati <29585319+EdAbati@users.noreply.github.com>",
    "download_url": "https://files.pythonhosted.org/packages/21/07/688833c253328e9f6c5d131ff49af0f7c657d275db42d80174342ccecac3/dataframes_haystack-0.0.2.tar.gz",
    "platform": null,
    "description": "# Dataframes Haystack\n\n[![PyPI - Version](https://img.shields.io/pypi/v/dataframes-haystack)](https://pypi.org/project/dataframes-haystack)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/dataframes-haystack?logo=python&logoColor=white)](https://pypi.org/project/dataframes-haystack)\n[![PyPI - License](https://img.shields.io/pypi/l/dataframes-haystack)](https://pypi.org/project/dataframes-haystack)\n\n\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n\n[![GH Actions Tests](https://github.com/EdAbati/dataframes-haystack/actions/workflows/test.yml/badge.svg)](https://github.com/EdAbati/dataframes-haystack/actions/workflows/test.yml)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/EdAbati/dataframes-haystack/main.svg)](https://results.pre-commit.ci/latest/github/EdAbati/dataframes-haystack/main)\n\n-----\n\n## \ud83d\udcc3 Description\n\n`dataframes-haystack` is an extension for [Haystack 2](https://docs.haystack.deepset.ai/docs/intro) that enables integration with dataframe libraries.\n\nThe dataframe libraries currently supported are:\n- [pandas](https://pandas.pydata.org/)\n- [Polars](https://pola.rs)\n\nThe library offers various custom [Converters](https://docs.haystack.deepset.ai/docs/converters) components to transform dataframes into Haystack [`Document`](https://docs.haystack.deepset.ai/docs/data-classes#document) objects:\n- `FileToPandasDataFrame` and `FileToPolarsDataFrame` read files and convert them into dataframes.\n- `PandasDataFrameConverter` or `PolarsDataFrameConverter` convert data stored in dataframes into Haystack `Document`objects.\n\n## \ud83d\udee0\ufe0f Installation\n\n```sh\n# for pandas (pandas is already included in `haystack-ai`)\npip install dataframes-haystack\n\n# for polars\npip install \"dataframes-haystack[polars]\"\n```\n\n## \ud83d\udcbb Usage\n\n> [!TIP]\n> See the [Example Notebooks](./notebooks) for complete examples.\n\n### Pandas\n\n#### FileToPandasDataFrame\n\n```python\nfrom dataframes_haystack.components.converters.pandas import FileToPandasDataFrame\n\nconverter = FileToPandasDataFrame(file_format=\"csv\")\n\noutput_dataframe = converter.run(\n    file_paths=[\"data/doc1.csv\", \"data/doc2.csv\"]\n)\n```\n\nResult:\n```python\n>>> output_dataframe\n{'dataframe': <pandas.DataFrame>}\n```\n\n#### PandasDataFrameConverter\n\n```python\nimport pandas as pd\n\nfrom dataframes_haystack.components.converters.pandas import PandasDataFrameConverter\n\ndf = pd.DataFrame({\n    \"text\": [\"Hello world\", \"Hello everyone\"],\n    \"filename\": [\"doc1.txt\", \"doc2.txt\"],\n})\n\nconverter = PandasDataFrameConverter(content_column=\"text\", meta_columns=[\"filename\"])\ndocuments = converter.run(df)\n```\n\nResult:\n```python\n>>> documents\n{'documents': [\n    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),\n    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})\n]}\n```\n\n### Polars\n\n#### FileToPolarsDataFrame\n\n```python\nfrom dataframes_haystack.components.converters.polars import FileToPolarsDataFrame\n\nconverter = FileToPolarsDataFrame(file_format=\"csv\")\n\noutput_dataframe = converter.run(\n    file_paths=[\"data/doc1.csv\", \"data/doc2.csv\"]\n)\n```\n\nResult:\n```python\n>>> output_dataframe\n{'dataframe': <polars.DataFrame>}\n```\n\n#### PolarsDataFrameConverter\n\n```python\nimport polars as pl\n\nfrom dataframes_haystack.components.converters.polars import PolarsDataFrameConverter\n\ndf = pl.DataFrame({\n    \"text\": [\"Hello world\", \"Hello everyone\"],\n    \"filename\": [\"doc1.txt\", \"doc2.txt\"],\n})\n\nconverter = PolarsDataFrameConverter(content_column=\"text\", meta_columns=[\"filename\"])\ndocuments = converter.run(df)\n```\n\nResult:\n```python\n>>> documents\n{'documents': [\n    Document(id=0, content: 'Hello world', meta: {'filename': 'doc1.txt'}),\n    Document(id=1, content: 'Hello everyone', meta: {'filename': 'doc2.txt'})\n]}\n```\n\n## \ud83e\udd1d Contributing\n\nDo you have an idea for a new feature? Did you find a bug that needs fixing?\n\nFeel free to [open an issue](https://github.com/EdAbati/dataframes-haystack/issues) or submit a PR!\n\n### Setup development environment\n\nRequirements: [`hatch`](https://hatch.pypa.io/latest/install/), [`pre-commit`](https://pre-commit.com/#install)\n\n1. Clone the repository\n1. Run `hatch shell` to create and activate a virtual environment\n1. Run `pre-commit install` to install the pre-commit hooks. This will force the linting and formatting checks.\n\n### Run tests\n\n- Linting and formatting checks: `hatch run lint:fmt`\n- Unit tests: `hatch run test-cov-all`\n\n## \u270d\ufe0f License\n\n`dataframes-haystack` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.\n",
    "bugtrack_url": null,
    "license": "MIT License  Copyright (c) 2024-present Edoardo Abati <29585319+EdAbati@users.noreply.github.com>  Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:  The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.  THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.",
    "summary": "Haystack custom components for your favourite dataframe library.",
    "version": "0.0.2",
    "project_urls": {
        "Documentation": "https://github.com/EdAbati/dataframes-haystack#readme",
        "Issues": "https://github.com/EdAbati/dataframes-haystack/issues",
        "Source": "https://github.com/EdAbati/dataframes-haystack"
    },
    "split_keywords": [
        "ai",
        " dataframe",
        " haystack",
        " llm",
        " machine-learning",
        " nlp",
        " pandas",
        " polars"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7104be9076ea94d9f9da74021c011eb9a904c48f354d07e0fefb3e0a95f916a7",
                "md5": "d96365cafeb0fa528905a421423daf73",
                "sha256": "68b7f350909d29a50e6ea0e584face3feedd2079a2a60b6fab619929a893737c"
            },
            "downloads": -1,
            "filename": "dataframes_haystack-0.0.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "d96365cafeb0fa528905a421423daf73",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8",
            "size": 9704,
            "upload_time": "2024-07-28T21:50:55",
            "upload_time_iso_8601": "2024-07-28T21:50:55.056333Z",
            "url": "https://files.pythonhosted.org/packages/71/04/be9076ea94d9f9da74021c011eb9a904c48f354d07e0fefb3e0a95f916a7/dataframes_haystack-0.0.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "2107688833c253328e9f6c5d131ff49af0f7c657d275db42d80174342ccecac3",
                "md5": "826589aaedd0edd6ab97f4a446f1922a",
                "sha256": "442a1ad00d3dafbddbd933d3bf72dbdabfa9249b62978592263169354a3ee844"
            },
            "downloads": -1,
            "filename": "dataframes_haystack-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "826589aaedd0edd6ab97f4a446f1922a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 117855,
            "upload_time": "2024-07-28T21:50:56",
            "upload_time_iso_8601": "2024-07-28T21:50:56.384767Z",
            "url": "https://files.pythonhosted.org/packages/21/07/688833c253328e9f6c5d131ff49af0f7c657d275db42d80174342ccecac3/dataframes_haystack-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-07-28 21:50:56",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "EdAbati",
    "github_project": "dataframes-haystack#readme",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "dataframes-haystack"
}
        
Elapsed time: 0.53664s