# DocsChat 📚🗣️
`docschat` is a command-line interface that let's you start a local streamlit server and interact with your documents.
The chatbot utilizes a conversational retrieval chain to answer user queries based on the content of embedded documents. It leverages various NLP techniques, including language models and embeddings, to provide relevant responses.
## Features
- **Document Embedding:** Embeds PDF documents for efficient retrieval of information.
- **Conversational Interface:** Allows users to interact with documents through a chat interface.
- **Settings:** Provides customizable settings for configuring document retrieval and model parameters.
## Installation
To run the application locally, follow these steps for installation.
```bash
pip install DocsChat
```
Pulll Ollama llm:
```bash
ollama pull llama3
ollama pull llama2
ollama pull gemma
ollama pull mistral
ollama pull codellama
```
Start the Ollama server:
```bash
ollama run llama3
```
Run the application:
```bash
docschat
```
## Configure
![DocsChat](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/docschat.png)
### PDF sources
- Configure the PDF source directory from which all PDFs should be read in recusively.
- Select a splitter, this has an influence on the chunks that we will make available to the LLM and thus also on the answers. By default no splitter is selected, this means a larger context.
### Vector store
![Vector store](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/vectorestore.png)
- Chroma DB in memory is used as a vector store, which stores the data in a Persit directory, so the data in the DB is also available after the restart.
- The Retriever search type has and the various parameters influence the search of documents in the Vectore Store.
### Ollama
![Ollama](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/ollama.png)
- Configure the ollama server connection and the model with which the server was started.
- the LLM parameters influence the embedding of the PDFs but also the answering of questions in the RAG pipeline.
## Actions
![Actions](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/actions.png)
There are two functions available, the sync of PDF documents into the Vectore Store. This can take some time depending on the system resources, embedding and splitter. The Delete DB function deletes the Chroma Collection.
Raw data
{
"_id": null,
"home_page": "https://github.com/flojud/DocsChat",
"name": "DocsChat",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.10",
"maintainer_email": null,
"keywords": "llama, mistral, chat, streamlit, langchain, talk2docs",
"author": "flojud",
"author_email": "dev.flojud@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/87/c1/761df07ee056667a82e3cbb71297e42e5d40f006c21b38c55b9dbf1769f3/docschat-5.0.0.tar.gz",
"platform": null,
"description": "# DocsChat \ud83d\udcda\ud83d\udde3\ufe0f\n\n`docschat` is a command-line interface that let's you start a local streamlit server and interact with your documents.\n\nThe chatbot utilizes a conversational retrieval chain to answer user queries based on the content of embedded documents. It leverages various NLP techniques, including language models and embeddings, to provide relevant responses.\n\n## Features\n\n- **Document Embedding:** Embeds PDF documents for efficient retrieval of information.\n- **Conversational Interface:** Allows users to interact with documents through a chat interface.\n- **Settings:** Provides customizable settings for configuring document retrieval and model parameters.\n\n## Installation\n\nTo run the application locally, follow these steps for installation.\n\n```bash\npip install DocsChat\n```\n\nPulll Ollama llm:\n\n```bash\nollama pull llama3\nollama pull llama2\nollama pull gemma\nollama pull mistral\nollama pull codellama\n```\n\nStart the Ollama server:\n\n```bash\nollama run llama3\n```\n\nRun the application:\n\n```bash\ndocschat\n```\n\n## Configure\n\n![DocsChat](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/docschat.png)\n\n### PDF sources\n\n- Configure the PDF source directory from which all PDFs should be read in recusively.\n- Select a splitter, this has an influence on the chunks that we will make available to the LLM and thus also on the answers. By default no splitter is selected, this means a larger context.\n\n### Vector store\n\n![Vector store](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/vectorestore.png)\n\n- Chroma DB in memory is used as a vector store, which stores the data in a Persit directory, so the data in the DB is also available after the restart.\n- The Retriever search type has and the various parameters influence the search of documents in the Vectore Store.\n\n### Ollama\n\n![Ollama](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/ollama.png)\n\n- Configure the ollama server connection and the model with which the server was started.\n- the LLM parameters influence the embedding of the PDFs but also the answering of questions in the RAG pipeline.\n\n## Actions\n\n![Actions](https://raw.githubusercontent.com/flojud/DocsChat/development/assets/actions.png)\n\nThere are two functions available, the sync of PDF documents into the Vectore Store. This can take some time depending on the system resources, embedding and splitter. The Delete DB function deletes the Chroma Collection.\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "Chat with your docs using langchain in a streamlit app with mistral or llama in ollama.",
"version": "5.0.0",
"project_urls": {
"Bug Tracker": "https://github.com/flojud/DocsChat/issues",
"Homepage": "https://github.com/flojud/DocsChat"
},
"split_keywords": [
"llama",
" mistral",
" chat",
" streamlit",
" langchain",
" talk2docs"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "3ae430ea653f426f25629b76d415c5b4343637b3b53a049b28e6740ba9628506",
"md5": "ebc7782dd62bb559d62d8c664f824658",
"sha256": "fe6b0553d7fcb86a841c30be024435baffc75d48f9de1c0009289ef5a8a01187"
},
"downloads": -1,
"filename": "DocsChat-5.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ebc7782dd62bb559d62d8c664f824658",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.10",
"size": 11299,
"upload_time": "2024-05-13T08:26:16",
"upload_time_iso_8601": "2024-05-13T08:26:16.581057Z",
"url": "https://files.pythonhosted.org/packages/3a/e4/30ea653f426f25629b76d415c5b4343637b3b53a049b28e6740ba9628506/DocsChat-5.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "87c1761df07ee056667a82e3cbb71297e42e5d40f006c21b38c55b9dbf1769f3",
"md5": "f1eacc3d74a174242c07739df4c3bdc0",
"sha256": "4e0d95ef02b1a0589d898f80967a864062aeeffa57608306958b98918dd5951b"
},
"downloads": -1,
"filename": "docschat-5.0.0.tar.gz",
"has_sig": false,
"md5_digest": "f1eacc3d74a174242c07739df4c3bdc0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.10",
"size": 10599,
"upload_time": "2024-05-13T08:26:17",
"upload_time_iso_8601": "2024-05-13T08:26:17.495832Z",
"url": "https://files.pythonhosted.org/packages/87/c1/761df07ee056667a82e3cbb71297e42e5d40f006c21b38c55b9dbf1769f3/docschat-5.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-13 08:26:17",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "flojud",
"github_project": "DocsChat",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "aiohttp",
"specs": [
[
"==",
"3.9.5"
]
]
},
{
"name": "aiosignal",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "altair",
"specs": [
[
"==",
"5.3.0"
]
]
},
{
"name": "annotated-types",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "anyio",
"specs": [
[
"==",
"4.3.0"
]
]
},
{
"name": "asgiref",
"specs": [
[
"==",
"3.8.1"
]
]
},
{
"name": "async-timeout",
"specs": [
[
"==",
"4.0.3"
]
]
},
{
"name": "attrs",
"specs": [
[
"==",
"23.2.0"
]
]
},
{
"name": "backoff",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "bcrypt",
"specs": [
[
"==",
"4.1.3"
]
]
},
{
"name": "blinker",
"specs": [
[
"==",
"1.8.2"
]
]
},
{
"name": "build",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "cachetools",
"specs": [
[
"==",
"5.3.3"
]
]
},
{
"name": "certifi",
"specs": [
[
"==",
"2024.2.2"
]
]
},
{
"name": "charset-normalizer",
"specs": [
[
"==",
"3.3.2"
]
]
},
{
"name": "chroma-hnswlib",
"specs": [
[
"==",
"0.7.3"
]
]
},
{
"name": "chromadb",
"specs": [
[
"==",
"0.4.24"
]
]
},
{
"name": "click",
"specs": [
[
"==",
"8.1.7"
]
]
},
{
"name": "coloredlogs",
"specs": [
[
"==",
"15.0.1"
]
]
},
{
"name": "dataclasses-json",
"specs": [
[
"==",
"0.6.6"
]
]
},
{
"name": "Deprecated",
"specs": [
[
"==",
"1.2.14"
]
]
},
{
"name": "dnspython",
"specs": [
[
"==",
"2.6.1"
]
]
},
{
"name": "email_validator",
"specs": [
[
"==",
"2.1.1"
]
]
},
{
"name": "exceptiongroup",
"specs": [
[
"==",
"1.2.1"
]
]
},
{
"name": "fastapi",
"specs": [
[
"==",
"0.111.0"
]
]
},
{
"name": "fastapi-cli",
"specs": [
[
"==",
"0.0.3"
]
]
},
{
"name": "filelock",
"specs": [
[
"==",
"3.14.0"
]
]
},
{
"name": "flatbuffers",
"specs": [
[
"==",
"24.3.25"
]
]
},
{
"name": "frozenlist",
"specs": [
[
"==",
"1.4.1"
]
]
},
{
"name": "fsspec",
"specs": [
[
"==",
"2024.3.1"
]
]
},
{
"name": "gitdb",
"specs": [
[
"==",
"4.0.11"
]
]
},
{
"name": "GitPython",
"specs": [
[
"==",
"3.1.43"
]
]
},
{
"name": "google-auth",
"specs": [
[
"==",
"2.29.0"
]
]
},
{
"name": "googleapis-common-protos",
"specs": [
[
"==",
"1.63.0"
]
]
},
{
"name": "grpcio",
"specs": [
[
"==",
"1.63.0"
]
]
},
{
"name": "h11",
"specs": [
[
"==",
"0.14.0"
]
]
},
{
"name": "httpcore",
"specs": [
[
"==",
"1.0.5"
]
]
},
{
"name": "httptools",
"specs": [
[
"==",
"0.6.1"
]
]
},
{
"name": "httpx",
"specs": [
[
"==",
"0.27.0"
]
]
},
{
"name": "huggingface-hub",
"specs": [
[
"==",
"0.23.0"
]
]
},
{
"name": "humanfriendly",
"specs": [
[
"==",
"10.0"
]
]
},
{
"name": "idna",
"specs": [
[
"==",
"3.7"
]
]
},
{
"name": "importlib-metadata",
"specs": [
[
"==",
"7.0.0"
]
]
},
{
"name": "importlib_resources",
"specs": [
[
"==",
"6.4.0"
]
]
},
{
"name": "install",
"specs": [
[
"==",
"1.3.5"
]
]
},
{
"name": "Jinja2",
"specs": [
[
"==",
"3.1.4"
]
]
},
{
"name": "jsonpatch",
"specs": [
[
"==",
"1.33"
]
]
},
{
"name": "jsonpointer",
"specs": [
[
"==",
"2.4"
]
]
},
{
"name": "jsonschema",
"specs": [
[
"==",
"4.22.0"
]
]
},
{
"name": "jsonschema-specifications",
"specs": [
[
"==",
"2023.12.1"
]
]
},
{
"name": "kubernetes",
"specs": [
[
"==",
"29.0.0"
]
]
},
{
"name": "langchain",
"specs": [
[
"==",
"0.1.19"
]
]
},
{
"name": "langchain-chroma",
"specs": [
[
"==",
"0.1.0"
]
]
},
{
"name": "langchain-community",
"specs": [
[
"==",
"0.0.38"
]
]
},
{
"name": "langchain-core",
"specs": [
[
"==",
"0.1.52"
]
]
},
{
"name": "langchain-experimental",
"specs": [
[
"==",
"0.0.58"
]
]
},
{
"name": "langchain-text-splitters",
"specs": [
[
"==",
"0.0.1"
]
]
},
{
"name": "langsmith",
"specs": [
[
"==",
"0.1.56"
]
]
},
{
"name": "markdown-it-py",
"specs": [
[
"==",
"3.0.0"
]
]
},
{
"name": "MarkupSafe",
"specs": [
[
"==",
"2.1.5"
]
]
},
{
"name": "marshmallow",
"specs": [
[
"==",
"3.21.2"
]
]
},
{
"name": "mdurl",
"specs": [
[
"==",
"0.1.2"
]
]
},
{
"name": "mmh3",
"specs": [
[
"==",
"4.1.0"
]
]
},
{
"name": "monotonic",
"specs": [
[
"==",
"1.6"
]
]
},
{
"name": "mpmath",
"specs": [
[
"==",
"1.3.0"
]
]
},
{
"name": "multidict",
"specs": [
[
"==",
"6.0.5"
]
]
},
{
"name": "mypy-extensions",
"specs": [
[
"==",
"1.0.0"
]
]
},
{
"name": "numpy",
"specs": [
[
"==",
"1.26.4"
]
]
},
{
"name": "oauthlib",
"specs": [
[
"==",
"3.2.2"
]
]
},
{
"name": "onnxruntime",
"specs": [
[
"==",
"1.17.3"
]
]
},
{
"name": "opentelemetry-api",
"specs": [
[
"==",
"1.24.0"
]
]
},
{
"name": "opentelemetry-exporter-otlp-proto-common",
"specs": [
[
"==",
"1.24.0"
]
]
},
{
"name": "opentelemetry-exporter-otlp-proto-grpc",
"specs": [
[
"==",
"1.24.0"
]
]
},
{
"name": "opentelemetry-instrumentation",
"specs": [
[
"==",
"0.45b0"
]
]
},
{
"name": "opentelemetry-instrumentation-asgi",
"specs": [
[
"==",
"0.45b0"
]
]
},
{
"name": "opentelemetry-instrumentation-fastapi",
"specs": [
[
"==",
"0.45b0"
]
]
},
{
"name": "opentelemetry-proto",
"specs": [
[
"==",
"1.24.0"
]
]
},
{
"name": "opentelemetry-sdk",
"specs": [
[
"==",
"1.24.0"
]
]
},
{
"name": "opentelemetry-semantic-conventions",
"specs": [
[
"==",
"0.45b0"
]
]
},
{
"name": "opentelemetry-util-http",
"specs": [
[
"==",
"0.45b0"
]
]
},
{
"name": "orjson",
"specs": [
[
"==",
"3.10.3"
]
]
},
{
"name": "overrides",
"specs": [
[
"==",
"7.7.0"
]
]
},
{
"name": "packaging",
"specs": [
[
"==",
"23.2"
]
]
},
{
"name": "pandas",
"specs": [
[
"==",
"2.2.2"
]
]
},
{
"name": "pillow",
"specs": [
[
"==",
"10.3.0"
]
]
},
{
"name": "posthog",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "protobuf",
"specs": [
[
"==",
"4.25.3"
]
]
},
{
"name": "pulsar-client",
"specs": [
[
"==",
"3.5.0"
]
]
},
{
"name": "pyarrow",
"specs": [
[
"==",
"16.0.0"
]
]
},
{
"name": "pyasn1",
"specs": [
[
"==",
"0.6.0"
]
]
},
{
"name": "pyasn1_modules",
"specs": [
[
"==",
"0.4.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
"==",
"2.7.1"
]
]
},
{
"name": "pydantic_core",
"specs": [
[
"==",
"2.18.2"
]
]
},
{
"name": "pydeck",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "Pygments",
"specs": [
[
"==",
"2.18.0"
]
]
},
{
"name": "pypdf",
"specs": [
[
"==",
"4.2.0"
]
]
},
{
"name": "PyPika",
"specs": [
[
"==",
"0.48.9"
]
]
},
{
"name": "pyproject_hooks",
"specs": [
[
"==",
"1.1.0"
]
]
},
{
"name": "python-dateutil",
"specs": [
[
"==",
"2.9.0.post0"
]
]
},
{
"name": "python-dotenv",
"specs": [
[
"==",
"1.0.1"
]
]
},
{
"name": "python-multipart",
"specs": [
[
"==",
"0.0.9"
]
]
},
{
"name": "pytz",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "PyYAML",
"specs": [
[
"==",
"6.0.1"
]
]
},
{
"name": "referencing",
"specs": [
[
"==",
"0.35.1"
]
]
},
{
"name": "requests",
"specs": [
[
"==",
"2.31.0"
]
]
},
{
"name": "requests-oauthlib",
"specs": [
[
"==",
"2.0.0"
]
]
},
{
"name": "rich",
"specs": [
[
"==",
"13.7.1"
]
]
},
{
"name": "rpds-py",
"specs": [
[
"==",
"0.18.1"
]
]
},
{
"name": "rsa",
"specs": [
[
"==",
"4.9"
]
]
},
{
"name": "shellingham",
"specs": [
[
"==",
"1.5.4"
]
]
},
{
"name": "six",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "smmap",
"specs": [
[
"==",
"5.0.1"
]
]
},
{
"name": "sniffio",
"specs": [
[
"==",
"1.3.1"
]
]
},
{
"name": "SQLAlchemy",
"specs": [
[
"==",
"2.0.30"
]
]
},
{
"name": "starlette",
"specs": [
[
"==",
"0.37.2"
]
]
},
{
"name": "stqdm",
"specs": [
[
"==",
"0.0.5"
]
]
},
{
"name": "streamlit",
"specs": [
[
"==",
"1.34.0"
]
]
},
{
"name": "sympy",
"specs": [
[
"==",
"1.12"
]
]
},
{
"name": "tenacity",
"specs": [
[
"==",
"8.3.0"
]
]
},
{
"name": "tokenizers",
"specs": [
[
"==",
"0.19.1"
]
]
},
{
"name": "toml",
"specs": [
[
"==",
"0.10.2"
]
]
},
{
"name": "tomli",
"specs": [
[
"==",
"2.0.1"
]
]
},
{
"name": "toolz",
"specs": [
[
"==",
"0.12.1"
]
]
},
{
"name": "tornado",
"specs": [
[
"==",
"6.4"
]
]
},
{
"name": "tqdm",
"specs": [
[
"==",
"4.66.4"
]
]
},
{
"name": "typer",
"specs": [
[
"==",
"0.12.3"
]
]
},
{
"name": "typing-inspect",
"specs": [
[
"==",
"0.9.0"
]
]
},
{
"name": "typing_extensions",
"specs": [
[
"==",
"4.11.0"
]
]
},
{
"name": "tzdata",
"specs": [
[
"==",
"2024.1"
]
]
},
{
"name": "ujson",
"specs": [
[
"==",
"5.9.0"
]
]
},
{
"name": "urllib3",
"specs": [
[
"==",
"2.2.1"
]
]
},
{
"name": "uvicorn",
"specs": [
[
"==",
"0.29.0"
]
]
},
{
"name": "uvloop",
"specs": [
[
"==",
"0.19.0"
]
]
},
{
"name": "watchdog",
"specs": [
[
"==",
"4.0.0"
]
]
},
{
"name": "watchfiles",
"specs": [
[
"==",
"0.21.0"
]
]
},
{
"name": "websocket-client",
"specs": [
[
"==",
"1.8.0"
]
]
},
{
"name": "websockets",
"specs": [
[
"==",
"12.0"
]
]
},
{
"name": "wrapt",
"specs": [
[
"==",
"1.16.0"
]
]
},
{
"name": "yarl",
"specs": [
[
"==",
"1.9.4"
]
]
},
{
"name": "zipp",
"specs": [
[
"==",
"3.18.1"
]
]
}
],
"lcname": "docschat"
}