We believe that **hallucinations** pose a major problem in the adoption of LLMs (Language Model Models).
It is imperative to provide a simple and quick solution that allows the user to verify the coherence of the answers
to the questions they are asked. Don't forget that even answers to questions can be subject to hallucinations
The conventional approach is to provide a list of URLs of the documents that helped in answering (see qa_with_source).
However, this approach is unsatisfactory in several scenarios:
1. The question is asked about a PDF of over 100 pages. Each fragment comes from the same document, but from where?
2. Some documents do not have URLs (data retrieved from a database or other *loaders*).
Other technical considerations make the URL approach tricky.
This is because prompts do not work well with complex URLs. This consumes a
huge number of tokens, and in the end, the result is too big.
It appears essential to have a means of retrieving all references to the actual data sources
used by the model to answer the question.
It is better to return a list of `Documents` than a list of URLs.
This includes:
- The precise list of documents really used for the answer (the `Documents`, along with their metadata that may contain page numbers,
slide numbers, or any other information allowing the retrieval of the fragment in the original document).
- The excerpts of text used for the answer in each fragment. Even if a fragment is used, the LLM only utilizes a
small portion to generate the answer. Access to these verbatim excerpts helps to quickly ascertain the validity of the answer.
We propose a two pipelines: `qa_with_reference` and `qa_with_reference_and_verbatims` for this purpose.
It is a Question/Answer type pipeline that returns the list of documents used, and in the metadata, the list of verbatim
excerpts exploited to produce the answer. It is very similar to `qa_with_sources_chain` in that it is inspired by it.
If the verbatim is not really from the original document, it's removed.
# Install
```
pip install langchain-qa_with_reference
```
# Sample notebook
See [here](https://github.com/pprados/langchain-qa_with_references/blob/master/qa_with_reference_and_verbatim.ipynb)
# langchain Pull-request
This is a temporary project while I wait for my langchain
[pull-request](https://github.com/hwchase17/langchain/pull/5135)
to be validated.
# It's experimental
For the moment, the code is being tested in a number of environments to validate and adjust it.
The langchain framework is very instable. Some features can become depreciated.
We work to maintain compatibility as far as possible.
Raw data
{
"_id": null,
"home_page": "https://www.github.com/pprados/langchain-qa_with_references",
"name": "langchain-qa-with-references",
"maintainer": "",
"docs_url": null,
"requires_python": ">=3.8.1,<4.0",
"maintainer_email": "",
"keywords": "",
"author": "Philippe PRADOS",
"author_email": "",
"download_url": "https://files.pythonhosted.org/packages/a4/56/0ef1a137fbdad4b3ea7977b9cd9219484bf7205801391e9aa8e220af0573/langchain_qa_with_references-0.0.330.tar.gz",
"platform": null,
"description": "We believe that **hallucinations** pose a major problem in the adoption of LLMs (Language Model Models). \nIt is imperative to provide a simple and quick solution that allows the user to verify the coherence of the answers \nto the questions they are asked. Don't forget that even answers to questions can be subject to hallucinations\n\nThe conventional approach is to provide a list of URLs of the documents that helped in answering (see qa_with_source). \nHowever, this approach is unsatisfactory in several scenarios:\n1. The question is asked about a PDF of over 100 pages. Each fragment comes from the same document, but from where?\n2. Some documents do not have URLs (data retrieved from a database or other *loaders*).\n\nOther technical considerations make the URL approach tricky.\nThis is because prompts do not work well with complex URLs. This consumes a \nhuge number of tokens, and in the end, the result is too big.\n\nIt appears essential to have a means of retrieving all references to the actual data sources \nused by the model to answer the question. \nIt is better to return a list of `Documents` than a list of URLs.\n\nThis includes:\n- The precise list of documents really used for the answer (the `Documents`, along with their metadata that may contain page numbers, \nslide numbers, or any other information allowing the retrieval of the fragment in the original document).\n- The excerpts of text used for the answer in each fragment. Even if a fragment is used, the LLM only utilizes a \nsmall portion to generate the answer. Access to these verbatim excerpts helps to quickly ascertain the validity of the answer.\n\nWe propose a two pipelines: `qa_with_reference` and `qa_with_reference_and_verbatims` for this purpose. \nIt is a Question/Answer type pipeline that returns the list of documents used, and in the metadata, the list of verbatim \nexcerpts exploited to produce the answer. It is very similar to `qa_with_sources_chain` in that it is inspired by it.\n\nIf the verbatim is not really from the original document, it's removed.\n# Install\n```\npip install langchain-qa_with_reference\n```\n\n# Sample notebook\n\nSee [here](https://github.com/pprados/langchain-qa_with_references/blob/master/qa_with_reference_and_verbatim.ipynb)\n\n# langchain Pull-request\nThis is a temporary project while I wait for my langchain \n[pull-request](https://github.com/hwchase17/langchain/pull/5135) \nto be validated.\n\n# It's experimental\nFor the moment, the code is being tested in a number of environments to validate and adjust it.\nThe langchain framework is very instable. Some features can become depreciated.\nWe work to maintain compatibility as far as possible.\n",
"bugtrack_url": null,
"license": "Apache 2.0",
"summary": "This is a temporary project while I wait for my langchain [pull-request](https://github.com/langchain-ai/langchain/pull/7278) to be validated.",
"version": "0.0.330",
"project_urls": {
"Homepage": "https://www.github.com/pprados/langchain-qa_with_references",
"Repository": "https://www.github.com/pprados/langchain-qa_with_references"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f077414537583d8caf4392f100dcb0ad2e06c319394b897002fac237d399c537",
"md5": "eca1f82f8333d34dda099f6c3d8b261c",
"sha256": "cb79629a7440be05aa21ba9ec00e1c0649b651f8ebf3530c058d420d2484c482"
},
"downloads": -1,
"filename": "langchain_qa_with_references-0.0.330-py3-none-any.whl",
"has_sig": false,
"md5_digest": "eca1f82f8333d34dda099f6c3d8b261c",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8.1,<4.0",
"size": 29333,
"upload_time": "2023-11-07T13:22:42",
"upload_time_iso_8601": "2023-11-07T13:22:42.267638Z",
"url": "https://files.pythonhosted.org/packages/f0/77/414537583d8caf4392f100dcb0ad2e06c319394b897002fac237d399c537/langchain_qa_with_references-0.0.330-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a4560ef1a137fbdad4b3ea7977b9cd9219484bf7205801391e9aa8e220af0573",
"md5": "b8abfb53a97aa337124648d2faeb57ad",
"sha256": "b27709deda0ef7af7cf1d6c43fa937fc4bc7a957eda4a6cf235c36ca4798233a"
},
"downloads": -1,
"filename": "langchain_qa_with_references-0.0.330.tar.gz",
"has_sig": false,
"md5_digest": "b8abfb53a97aa337124648d2faeb57ad",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8.1,<4.0",
"size": 19652,
"upload_time": "2023-11-07T13:22:43",
"upload_time_iso_8601": "2023-11-07T13:22:43.798903Z",
"url": "https://files.pythonhosted.org/packages/a4/56/0ef1a137fbdad4b3ea7977b9cd9219484bf7205801391e9aa8e220af0573/langchain_qa_with_references-0.0.330.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-11-07 13:22:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "pprados",
"github_project": "langchain-qa_with_references",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "langchain-qa-with-references"
}