# ragdata: Build knowledge bases for RAG
<p align="center">
<a href="https://github.com/neuml/ragdata/releases">
<img src="https://img.shields.io/github/release/neuml/ragdata.svg?style=flat&color=success" alt="Version"/>
</a>
<a href="https://github.com/neuml/ragdata/releases">
<img src="https://img.shields.io/github/release-date/neuml/ragdata.svg?style=flat&color=blue" alt="GitHub Release Date"/>
</a>
<a href="https://github.com/neuml/ragdata/issues">
<img src="https://img.shields.io/github/issues/neuml/ragdata.svg?style=flat&color=success" alt="GitHub issues"/>
</a>
<a href="https://github.com/neuml/ragdata">
<img src="https://img.shields.io/github/last-commit/neuml/ragdata.svg?style=flat&color=blue" alt="GitHub last commit"/>
</a>
</p>
`ragdata` builds knowledge bases for Retrieval Augmented Generation (RAG).
This project has processes to build [txtai](https://github.com/neuml/txtai) embeddings databases for common datasets.
The currently supported datasets are:
- [ArXiv](https://huggingface.co/NeuML/txtai-arxiv)
- [Wikipedia](https://huggingface.co/NeuML/txtai-wikipedia)
Each of the links above has full instructions on how to build those datasets, including using this project.
## Installation
The easiest way to install is via pip and PyPI
```
pip install ragdata
```
Python 3.9+ is supported. Using a Python [virtual environment](https://docs.python.org/3/library/venv.html) is recommended.
`ragdata` can also be installed directly from GitHub to access the latest, unreleased features.
```
pip install git+https://github.com/neuml/ragdata
```
Raw data
{
"_id": null,
"home_page": "https://github.com/neuml/ragdata",
"name": "ragdata",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "search embedding machine-learning nlp",
"author": "NeuML",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/f1/a8/0b39b17362e5fac98d28caa2f499264f4a47c54c46de9a00c804e51fed7f/ragdata-0.1.0.tar.gz",
"platform": null,
"description": "# ragdata: Build knowledge bases for RAG\n\n<p align=\"center\">\n <a href=\"https://github.com/neuml/ragdata/releases\">\n <img src=\"https://img.shields.io/github/release/neuml/ragdata.svg?style=flat&color=success\" alt=\"Version\"/>\n </a>\n <a href=\"https://github.com/neuml/ragdata/releases\">\n <img src=\"https://img.shields.io/github/release-date/neuml/ragdata.svg?style=flat&color=blue\" alt=\"GitHub Release Date\"/>\n </a>\n <a href=\"https://github.com/neuml/ragdata/issues\">\n <img src=\"https://img.shields.io/github/issues/neuml/ragdata.svg?style=flat&color=success\" alt=\"GitHub issues\"/>\n </a>\n <a href=\"https://github.com/neuml/ragdata\">\n <img src=\"https://img.shields.io/github/last-commit/neuml/ragdata.svg?style=flat&color=blue\" alt=\"GitHub last commit\"/>\n </a>\n</p>\n\n`ragdata` builds knowledge bases for Retrieval Augmented Generation (RAG).\n\nThis project has processes to build [txtai](https://github.com/neuml/txtai) embeddings databases for common datasets.\n\nThe currently supported datasets are:\n\n- [ArXiv](https://huggingface.co/NeuML/txtai-arxiv)\n- [Wikipedia](https://huggingface.co/NeuML/txtai-wikipedia)\n\nEach of the links above has full instructions on how to build those datasets, including using this project.\n\n## Installation\nThe easiest way to install is via pip and PyPI\n\n```\npip install ragdata\n```\n\nPython 3.9+ is supported. Using a Python [virtual environment](https://docs.python.org/3/library/venv.html) is recommended.\n\n`ragdata` can also be installed directly from GitHub to access the latest, unreleased features.\n\n```\npip install git+https://github.com/neuml/ragdata\n```\n",
"bugtrack_url": null,
"license": "Apache 2.0: http://www.apache.org/licenses/LICENSE-2.0",
"summary": "Build knowledge bases for RAG",
"version": "0.1.0",
"project_urls": {
"Documentation": "https://github.com/neuml/ragdata",
"Homepage": "https://github.com/neuml/ragdata",
"Issue Tracker": "https://github.com/neuml/ragdata/issues",
"Source Code": "https://github.com/neuml/ragdata"
},
"split_keywords": [
"search",
"embedding",
"machine-learning",
"nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "1cfabfd461b487b55e75d9064a2feab69cc60be2b54f8cecc11e4070c199ac2b",
"md5": "51bf33d9470d7052cbe64d09bcdd565b",
"sha256": "0f20c393d1ac95c33c424ae89d90aa75050e96c0a980f12695b132f9daee079d"
},
"downloads": -1,
"filename": "ragdata-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "51bf33d9470d7052cbe64d09bcdd565b",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 12842,
"upload_time": "2024-12-18T13:25:18",
"upload_time_iso_8601": "2024-12-18T13:25:18.176004Z",
"url": "https://files.pythonhosted.org/packages/1c/fa/bfd461b487b55e75d9064a2feab69cc60be2b54f8cecc11e4070c199ac2b/ragdata-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "f1a80b39b17362e5fac98d28caa2f499264f4a47c54c46de9a00c804e51fed7f",
"md5": "a492f9aa82e4c1595e34d97e63fd55d4",
"sha256": "8e73ede9d2f235821fed8d27bbcaa2627fcbf3992b08273251f3dfa6ce10dd71"
},
"downloads": -1,
"filename": "ragdata-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "a492f9aa82e4c1595e34d97e63fd55d4",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 11863,
"upload_time": "2024-12-18T13:25:19",
"upload_time_iso_8601": "2024-12-18T13:25:19.542764Z",
"url": "https://files.pythonhosted.org/packages/f1/a8/0b39b17362e5fac98d28caa2f499264f4a47c54c46de9a00c804e51fed7f/ragdata-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-18 13:25:19",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neuml",
"github_project": "ragdata",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "ragdata"
}