# RAGchain
RAGchain is a framework for developing advanced RAG(Retrieval Augmented Generation) workflow powered by LLM (Large Language Model).
While existing frameworks like Langchain or LlamaIndex allow you to build simple RAG workflows, they have limitations when it comes to building complex and high-accuracy RAG workflows.
RAGchain is designed to overcome these limitations by providing powerful features for building advanced RAG workflow easily.
Also, it is partially compatible with Langchain, allowing you to leverage many of its integrations for vector storage,
embeddings, document loaders, and LLM models.
[Docs](https://nomadamas.gitbook.io/ragchain-docs/) | [API Spec](https://marker-inc-korea.github.io/RAGchain/build/html/index.html) | [QuickStart](https://nomadamas.gitbook.io/ragchain-docs/quick-start)
# Quick Install
```bash
pip install RAGchain
```
# Why RAGchain?
RAGchain offers several powerful features for building high-quality RAG workflows:
## OCR Loaders
Simple file loaders may not be sufficient when trying to enhance accuracy or ingest real-world documents. OCR models can scan documents and convert them into text with high accuracy, improving the quality of responses from LLMs.
## Reranker
Reranking is a popular method used in many research projects to improve retrieval accuracy in RAG workflows. Unlike LangChain, which doesn't include reranking as a default feature, RAGChain comes with various rerankers.
## Great to use multiple retrievers
In real-world scenarios, you may need multiple retrievers depending on your requirements. RAGchain is highly optimized for using multiple retrievers. It divides retrieval and DB. Retrieval saves vector representation of contents, and DB saves contents. We connect both with Linker, so it is really easy to use multiple retrievers and DBs.
## pre-made RAG pipelines
We provide pre-made pipelines that let you quickly set up RAG workflow. We are planning to make much complex pipelines, which hard to make but powerful. With pipelines, you can build really powerful RAG system quickly and easily.
## Easy benchmarking
It is crucial to benchmark and test your RAG workflows. We have easy benchmarking module for evaluation. Support your
own questions and various datasets.
# Installation
## From pip
simply install at pypi.
```bash
pip install RAGchain
```
## From source
First, clone this git repository to your local machine.
```bash
git clone https://github.com/Marker-Inc-Korea/RAGchain.git
cd RAGchain
```
Then, install RAGchain module.
```bash
python3 setup.py develop
```
For using files at root folder and test, run dev requirements.
```bash
pip install dev_requirements.txt
```
# Supporting Features
## Advanced RAG features
- [Time-Aware RAG](https://nomadamas.gitbook.io/ragchain-docs/advanced-rag/time_aware_rag)
- [Importance-Aware RAG]()
## Retrievals
- BM25
- Vector DB
- Hybrid ([rrf](https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html) and [cc](https://arxiv.org/abs/2210.11934))
- [HyDE](https://arxiv.org/abs/2212.10496)
## OCR Loaders
- [Nougat](https://github.com/facebookresearch/nougat)
- [Deepdoctection](https://github.com/deepdoctection/deepdoctection)
## Rerankers
- [UPR](https://github.com/DevSinghSachan/unsupervised-passage-reranking)
- [TART](https://github.com/facebookresearch/tart)
- BM25
- LLM
- [MonoT5](https://huggingface.co/castorini/monot5-3b-msmarco-10k)
## Web Search
- Google Search
- Bing Search
## Workflows (pipeline)
- Basic
- [Visconde](https://arxiv.org/abs/2212.09656)
- Rerank
- Google Search
## Extra utils
- Query Decomposition
- Evidence Extractor
- [REDE](https://arxiv.org/pdf/2109.08820.pdf) Search Detector
- Semantic Clustering
- Cluster Time Compressor
## Dataset Evaluators
- [MS-MARCO](https://paperswithcode.com/dataset/ms-marco)
- [Mr. Tydi](https://arxiv.org/abs/2108.08787)
- [Qasper](https://paperswithcode.com/dataset/qasper)
- [StrategyQA](https://allenai.org/data/strategyqa)
- [KoStrategyQA](https://huggingface.co/datasets/NomaDamas/Ko-StrategyQA)
- [ANTIQUE](https://paperswithcode.com/dataset/antique)
- [ASQA](https://arxiv.org/abs/2204.06092)
- [DSTC11-Track5](https://github.com/alexa/dstc11-track5)
- [Natural QA](https://paperswithcode.com/dataset/natural-questions)
- [NFCorpus](http://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/)
- [SearchQA](https://arxiv.org/abs/1704.05179)
- [TriviaQA](https://arxiv.org/abs/1705.03551)
- [ELI5](https://huggingface.co/datasets/Pakulski/ELI5-test)
# Contributing
We welcome any contributions. Please feel free to raise issues and submit pull requests.
# Acknowledgement
This project is an early version, so it can be unstable. The project is licensed under the Apache 2.0 License.
Raw data
{
"_id": null,
"home_page": "https://github.com/Marker-Inc-Korea/RAGchain",
"name": "RAGchain",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": "",
"keywords": "RAG,RAGchain,ragchain,LLM,Langchain,DQA,GPT,ODQA",
"author": "Marker-Inc",
"author_email": "vkehfdl1@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/21/43/f09d0ec4d924260826b568a35af5be87be3684457b660974f2b54603e771/RAGchain-0.2.6.tar.gz",
"platform": null,
"description": "# RAGchain\n\nRAGchain is a framework for developing advanced RAG(Retrieval Augmented Generation) workflow powered by LLM (Large Language Model).\nWhile existing frameworks like Langchain or LlamaIndex allow you to build simple RAG workflows, they have limitations when it comes to building complex and high-accuracy RAG workflows.\n\nRAGchain is designed to overcome these limitations by providing powerful features for building advanced RAG workflow easily.\nAlso, it is partially compatible with Langchain, allowing you to leverage many of its integrations for vector storage,\nembeddings, document loaders, and LLM models.\n\n[Docs](https://nomadamas.gitbook.io/ragchain-docs/) | [API Spec](https://marker-inc-korea.github.io/RAGchain/build/html/index.html) | [QuickStart](https://nomadamas.gitbook.io/ragchain-docs/quick-start)\n\n# Quick Install\n```bash\npip install RAGchain\n```\n\n# Why RAGchain?\nRAGchain offers several powerful features for building high-quality RAG workflows:\n\n## OCR Loaders\nSimple file loaders may not be sufficient when trying to enhance accuracy or ingest real-world documents. OCR models can scan documents and convert them into text with high accuracy, improving the quality of responses from LLMs.\n\n## Reranker\nReranking is a popular method used in many research projects to improve retrieval accuracy in RAG workflows. Unlike LangChain, which doesn't include reranking as a default feature, RAGChain comes with various rerankers.\n\n## Great to use multiple retrievers\nIn real-world scenarios, you may need multiple retrievers depending on your requirements. RAGchain is highly optimized for using multiple retrievers. It divides retrieval and DB. Retrieval saves vector representation of contents, and DB saves contents. We connect both with Linker, so it is really easy to use multiple retrievers and DBs.\n\n## pre-made RAG pipelines\nWe provide pre-made pipelines that let you quickly set up RAG workflow. We are planning to make much complex pipelines, which hard to make but powerful. With pipelines, you can build really powerful RAG system quickly and easily. \n\n## Easy benchmarking\n\nIt is crucial to benchmark and test your RAG workflows. We have easy benchmarking module for evaluation. Support your\nown questions and various datasets.\n\n\n# Installation\n## From pip\n\nsimply install at pypi.\n\n```bash\npip install RAGchain\n```\n\n## From source\nFirst, clone this git repository to your local machine.\n\n```bash\ngit clone https://github.com/Marker-Inc-Korea/RAGchain.git\ncd RAGchain\n```\n\nThen, install RAGchain module.\n```bash\npython3 setup.py develop\n```\n\nFor using files at root folder and test, run dev requirements.\n```bash\npip install dev_requirements.txt\n```\n\n# Supporting Features\n\n## Advanced RAG features\n\n- [Time-Aware RAG](https://nomadamas.gitbook.io/ragchain-docs/advanced-rag/time_aware_rag)\n- [Importance-Aware RAG]()\n\n## Retrievals\n- BM25\n- Vector DB\n- Hybrid ([rrf](https://www.elastic.co/guide/en/elasticsearch/reference/current/rrf.html) and [cc](https://arxiv.org/abs/2210.11934))\n- [HyDE](https://arxiv.org/abs/2212.10496)\n\n## OCR Loaders\n\n- [Nougat](https://github.com/facebookresearch/nougat)\n- [Deepdoctection](https://github.com/deepdoctection/deepdoctection)\n\n## Rerankers\n- [UPR](https://github.com/DevSinghSachan/unsupervised-passage-reranking)\n- [TART](https://github.com/facebookresearch/tart)\n- BM25\n- LLM\n- [MonoT5](https://huggingface.co/castorini/monot5-3b-msmarco-10k)\n\n## Web Search\n- Google Search\n- Bing Search\n\n## Workflows (pipeline)\n- Basic\n- [Visconde](https://arxiv.org/abs/2212.09656)\n- Rerank\n- Google Search\n\n## Extra utils\n- Query Decomposition\n- Evidence Extractor\n- [REDE](https://arxiv.org/pdf/2109.08820.pdf) Search Detector\n- Semantic Clustering\n- Cluster Time Compressor\n\n## Dataset Evaluators\n\n- [MS-MARCO](https://paperswithcode.com/dataset/ms-marco)\n- [Mr. Tydi](https://arxiv.org/abs/2108.08787)\n- [Qasper](https://paperswithcode.com/dataset/qasper)\n- [StrategyQA](https://allenai.org/data/strategyqa)\n- [KoStrategyQA](https://huggingface.co/datasets/NomaDamas/Ko-StrategyQA)\n- [ANTIQUE](https://paperswithcode.com/dataset/antique)\n- [ASQA](https://arxiv.org/abs/2204.06092)\n- [DSTC11-Track5](https://github.com/alexa/dstc11-track5)\n- [Natural QA](https://paperswithcode.com/dataset/natural-questions)\n- [NFCorpus](http://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/)\n- [SearchQA](https://arxiv.org/abs/1704.05179)\n- [TriviaQA](https://arxiv.org/abs/1705.03551)\n- [ELI5](https://huggingface.co/datasets/Pakulski/ELI5-test)\n\n\n# Contributing\nWe welcome any contributions. Please feel free to raise issues and submit pull requests.\n\n# Acknowledgement\n\nThis project is an early version, so it can be unstable. The project is licensed under the Apache 2.0 License.\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Build advanced RAG workflows with LLM, compatible with Langchain",
"version": "0.2.6",
"project_urls": {
"Homepage": "https://github.com/Marker-Inc-Korea/RAGchain"
},
"split_keywords": [
"rag",
"ragchain",
"ragchain",
"llm",
"langchain",
"dqa",
"gpt",
"odqa"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "8726e647db934f6ddeda0c0b8b4e607022c53a672cf787a459b29171fbc16ecb",
"md5": "f35eee912bb14aae6e39140cf836a19f",
"sha256": "ec57201c67b899c12a59135c484c3f4283123016c3596f0c8c854bfe61958d51"
},
"downloads": -1,
"filename": "RAGchain-0.2.6-py3-none-any.whl",
"has_sig": false,
"md5_digest": "f35eee912bb14aae6e39140cf836a19f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 157530,
"upload_time": "2024-01-09T05:06:09",
"upload_time_iso_8601": "2024-01-09T05:06:09.533164Z",
"url": "https://files.pythonhosted.org/packages/87/26/e647db934f6ddeda0c0b8b4e607022c53a672cf787a459b29171fbc16ecb/RAGchain-0.2.6-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "2143f09d0ec4d924260826b568a35af5be87be3684457b660974f2b54603e771",
"md5": "954293f86ed7a2c1bba0fe6ea0f96fb9",
"sha256": "2c396d44e0584345b8f2a6ef783163da8426d79514cd41b0e5275fe372d7bcc9"
},
"downloads": -1,
"filename": "RAGchain-0.2.6.tar.gz",
"has_sig": false,
"md5_digest": "954293f86ed7a2c1bba0fe6ea0f96fb9",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 85350,
"upload_time": "2024-01-09T05:06:11",
"upload_time_iso_8601": "2024-01-09T05:06:11.593980Z",
"url": "https://files.pythonhosted.org/packages/21/43/f09d0ec4d924260826b568a35af5be87be3684457b660974f2b54603e771/RAGchain-0.2.6.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-01-09 05:06:11",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Marker-Inc-Korea",
"github_project": "RAGchain",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"tox": true,
"lcname": "ragchain"
}