# Miner AI Beta ⛏️ -- Library under construction
<p align="center">
<img src="images\logo\MINER-AI.png" alt="Scrapegraph-ai Logo" style="width: 35%;">
</p>
Miner AI Beta represents a groundbreaking endeavor, specially crafted to compile an assortment of documents and web pages into a searchable database, all achieved offline. Utilizing cutting-edge language models along with sophisticated indexing strategies, Miner AI Beta is uniquely positioned to streamline the retrieval of information, ensuring speed, efficiency, and superior relevance without the need for an online connection.
## 💪 Features
- Indexing support for PDFs, PowerPoint presentations, Excel spreadsheets, web pages, and YouTube video transcripts.
- Utilizes powerful embeddings and vector storage mechanisms to create efficient search indexes.
- Merge functionality to combine multiple indexes for comprehensive search capabilities.
- Designed with modularity in mind, allowing for easy extension to support additional document types.
## 💻 Installation
Miner AI Beta requires Python 3.12 or later. It is recommended to use a virtual environment to manage the project dependencies.
To install Miner AI Beta and its dependencies, follow these steps:
```bash
# Install the library from pipy
pip install miner-ai-beta==<last-code-version>
```
## ⚒️ Usage
1. **Indexing Documents**
To start indexing your documents, you need to prepare your documents in the supported formats (PDF, PPTX, XLSX, web pages, YouTube videos).
Example for indexing PDFs:
```python
# Initialize your embeddings model
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
# Initialize your vector store (FAISS, ChromaDB, etc.)
from langchain.vectorstores.faiss import FAISS
vectorstore = FAISS
# Index your documents inside the vectorstore
from miner_ai_beta.loader import IndexFromPdfs
from miner_ai_beta.loader import IndexFromDocs
from miner_ai_beta.loader import IndexFromXlss
folder_path = 'path/to/your/pdfs'
pdf = IndexFromPdfs(folder_path, embeddings, vectorstore)
doc = IndexFromDocs(folder_path, embeddings, vectorstore)
excel = IndexFromXlss(folder_path, embeddings, vectorstore)
# Merge your indexes
from miner_ai_beta.loader import MergeIndexes
final_index = MergeIndexes([pdf, doc, excel])
# Save your index locally for later use
final_index.save_local('path/to/your/final/index')
```
2. **Searching**
To search within your index, you will need to implement a search mechanism that leverages the created indexes.
Please refer to `vectorstore` documentation for details on querying indexed data.
Example for searching documents:
```python
# After you have saved your index locally, you can load it later
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
FAISS.load_local('path/to/your/final/index', embeddings, allow_dangerous_deserialization=True)
# Initialize your vector store as a retriever to retrieve only the first 10 documents that are most relevant to the query
retriever = db.as_retriever(search_kwargs={"k":10})
# Retrieve based on query string
query = "What is the meaning of life?"
result = retriever.invoke(query) # returns a list of documents
```
## 🤝 Contributing
Contributions are welcome! Feel free to open an issue or pull request if you have suggestions or improvements.
## 📜 License
Miner AI Beta is licensed under the MIT License. See the [MIT](LICENSE) file for more information.
## Acknowledgements
- We would like to thank all the contributors to the project and the open-source community for their support.
- Miner AI Beta is meant to be used for ai data mining over documents and research purposes only. We are not responsible for any misuse of the library.
Raw data
{
"_id": null,
"home_page": "https://github.com/Valerio357/miner-ai",
"name": "miner-ai-beta",
"maintainer": null,
"docs_url": null,
"requires_python": "<4.0,>=3.11",
"maintainer_email": null,
"keywords": "langchain, documents_mining, web_scraping",
"author": "Valerio Domenici",
"author_email": "valeriodomenici93@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/fa/eb/b2a1d2b567f97ed8afce10d473c267e3cf3d29ea9398e8249dd08ebedd7e/miner_ai_beta-0.1.21.tar.gz",
"platform": null,
"description": "# Miner AI Beta \u26cf\ufe0f -- Library under construction\n\n<p align=\"center\">\n <img src=\"images\\logo\\MINER-AI.png\" alt=\"Scrapegraph-ai Logo\" style=\"width: 35%;\">\n</p>\n\nMiner AI Beta represents a groundbreaking endeavor, specially crafted to compile an assortment of documents and web pages into a searchable database, all achieved offline. Utilizing cutting-edge language models along with sophisticated indexing strategies, Miner AI Beta is uniquely positioned to streamline the retrieval of information, ensuring speed, efficiency, and superior relevance without the need for an online connection.\n\n\n## \ud83d\udcaa Features\n\n- Indexing support for PDFs, PowerPoint presentations, Excel spreadsheets, web pages, and YouTube video transcripts.\n- Utilizes powerful embeddings and vector storage mechanisms to create efficient search indexes.\n- Merge functionality to combine multiple indexes for comprehensive search capabilities.\n- Designed with modularity in mind, allowing for easy extension to support additional document types.\n\n## \ud83d\udcbb Installation\n\nMiner AI Beta requires Python 3.12 or later. It is recommended to use a virtual environment to manage the project dependencies.\n\nTo install Miner AI Beta and its dependencies, follow these steps:\n\n```bash\n# Install the library from pipy\npip install miner-ai-beta==<last-code-version>\n```\n\n## \u2692\ufe0f Usage\n\n1. **Indexing Documents**\n\n To start indexing your documents, you need to prepare your documents in the supported formats (PDF, PPTX, XLSX, web pages, YouTube videos).\n\n Example for indexing PDFs:\n\n ```python\n # Initialize your embeddings model\n from langchain_openai import OpenAIEmbeddings\n\n embeddings = OpenAIEmbeddings() \n\n # Initialize your vector store (FAISS, ChromaDB, etc.)\n from langchain.vectorstores.faiss import FAISS\n\n vectorstore = FAISS \n\n # Index your documents inside the vectorstore\n from miner_ai_beta.loader import IndexFromPdfs\n from miner_ai_beta.loader import IndexFromDocs\n from miner_ai_beta.loader import IndexFromXlss\n\n folder_path = 'path/to/your/pdfs'\n\n pdf = IndexFromPdfs(folder_path, embeddings, vectorstore)\n doc = IndexFromDocs(folder_path, embeddings, vectorstore)\n excel = IndexFromXlss(folder_path, embeddings, vectorstore)\n\n # Merge your indexes\n from miner_ai_beta.loader import MergeIndexes\n\n final_index = MergeIndexes([pdf, doc, excel])\n\n # Save your index locally for later use\n final_index.save_local('path/to/your/final/index')\n ```\n\n\n2. **Searching**\n\n To search within your index, you will need to implement a search mechanism that leverages the created indexes.\n\n Please refer to `vectorstore` documentation for details on querying indexed data.\n\n Example for searching documents:\n\n ```python\n # After you have saved your index locally, you can load it later\n from langchain_openai import OpenAIEmbeddings\n embeddings = OpenAIEmbeddings() \n FAISS.load_local('path/to/your/final/index', embeddings, allow_dangerous_deserialization=True)\n\n # Initialize your vector store as a retriever to retrieve only the first 10 documents that are most relevant to the query\n retriever = db.as_retriever(search_kwargs={\"k\":10})\n\n # Retrieve based on query string\n query = \"What is the meaning of life?\"\n result = retriever.invoke(query) # returns a list of documents\n ```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Feel free to open an issue or pull request if you have suggestions or improvements.\n\n## \ud83d\udcdc License\n\nMiner AI Beta is licensed under the MIT License. See the [MIT](LICENSE) file for more information.\n\n## Acknowledgements\n\n- We would like to thank all the contributors to the project and the open-source community for their support.\n- Miner AI Beta is meant to be used for ai data mining over documents and research purposes only. We are not responsible for any misuse of the library.",
"bugtrack_url": null,
"license": "MIT",
"summary": null,
"version": "0.1.21",
"project_urls": {
"Homepage": "https://github.com/Valerio357/miner-ai",
"Repository": "https://github.com/Valerio357/miner-ai"
},
"split_keywords": [
"langchain",
" documents_mining",
" web_scraping"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0ddd896f8111b0d454e5afdbfb51a0074bedbeed7b4a1bed201211c072c68ed8",
"md5": "cdbce691c3585257200aaf5f922e1f92",
"sha256": "d6f43662ff44de7a8dd35aac52381a107083e5c06d67c2d8decb8df8be8571f6"
},
"downloads": -1,
"filename": "miner_ai_beta-0.1.21-py3-none-any.whl",
"has_sig": false,
"md5_digest": "cdbce691c3585257200aaf5f922e1f92",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.11",
"size": 16642,
"upload_time": "2024-05-09T09:59:14",
"upload_time_iso_8601": "2024-05-09T09:59:14.646811Z",
"url": "https://files.pythonhosted.org/packages/0d/dd/896f8111b0d454e5afdbfb51a0074bedbeed7b4a1bed201211c072c68ed8/miner_ai_beta-0.1.21-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "faebb2a1d2b567f97ed8afce10d473c267e3cf3d29ea9398e8249dd08ebedd7e",
"md5": "6ef12d54d6119983ea3f3168b281b76a",
"sha256": "e025a2b88922148cb09594549f5835d69c53a03f0da7ce7a57a888e5857aa0fe"
},
"downloads": -1,
"filename": "miner_ai_beta-0.1.21.tar.gz",
"has_sig": false,
"md5_digest": "6ef12d54d6119983ea3f3168b281b76a",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.11",
"size": 27647,
"upload_time": "2024-05-09T09:59:15",
"upload_time_iso_8601": "2024-05-09T09:59:15.984892Z",
"url": "https://files.pythonhosted.org/packages/fa/eb/b2a1d2b567f97ed8afce10d473c267e3cf3d29ea9398e8249dd08ebedd7e/miner_ai_beta-0.1.21.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-05-09 09:59:15",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Valerio357",
"github_project": "miner-ai",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "miner-ai-beta"
}