miner-ai-beta


Nameminer-ai-beta JSON
Version 0.1.21 PyPI version JSON
download
home_pagehttps://github.com/Valerio357/miner-ai
SummaryNone
upload_time2024-05-09 09:59:15
maintainerNone
docs_urlNone
authorValerio Domenici
requires_python<4.0,>=3.11
licenseMIT
keywords langchain documents_mining web_scraping
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Miner AI Beta ⛏️ -- Library under construction

<p align="center">
  <img src="images\logo\MINER-AI.png" alt="Scrapegraph-ai Logo" style="width: 35%;">
</p>

Miner AI Beta represents a groundbreaking endeavor, specially crafted to compile an assortment of documents and web pages into a searchable database, all achieved offline. Utilizing cutting-edge language models along with sophisticated indexing strategies, Miner AI Beta is uniquely positioned to streamline the retrieval of information, ensuring speed, efficiency, and superior relevance without the need for an online connection.


## 💪 Features

- Indexing support for PDFs, PowerPoint presentations, Excel spreadsheets, web pages, and YouTube video transcripts.
- Utilizes powerful embeddings and vector storage mechanisms to create efficient search indexes.
- Merge functionality to combine multiple indexes for comprehensive search capabilities.
- Designed with modularity in mind, allowing for easy extension to support additional document types.

## 💻 Installation

Miner AI Beta requires Python 3.12 or later. It is recommended to use a virtual environment to manage the project dependencies.

To install Miner AI Beta and its dependencies, follow these steps:

```bash
# Install the library from pipy
pip install miner-ai-beta==<last-code-version>
```

## ⚒️ Usage

1. **Indexing Documents**

   To start indexing your documents, you need to prepare your documents in the supported formats (PDF, PPTX, XLSX, web pages, YouTube videos).

   Example for indexing PDFs:

   ```python
   # Initialize your embeddings model
   from langchain_openai import OpenAIEmbeddings

   embeddings = OpenAIEmbeddings() 

   # Initialize your vector store (FAISS, ChromaDB, etc.)
   from langchain.vectorstores.faiss import FAISS

   vectorstore = FAISS 

   # Index your documents inside the vectorstore
   from miner_ai_beta.loader import IndexFromPdfs
   from miner_ai_beta.loader import IndexFromDocs
   from miner_ai_beta.loader import IndexFromXlss

   folder_path = 'path/to/your/pdfs'

   pdf = IndexFromPdfs(folder_path, embeddings, vectorstore)
   doc = IndexFromDocs(folder_path, embeddings, vectorstore)
   excel = IndexFromXlss(folder_path, embeddings, vectorstore)

   # Merge your indexes
   from miner_ai_beta.loader import MergeIndexes

   final_index = MergeIndexes([pdf, doc, excel])

   # Save your index locally for later use
   final_index.save_local('path/to/your/final/index')
   ```


2. **Searching**

   To search within your index, you will need to implement a search mechanism that leverages the created indexes.

   Please refer to `vectorstore` documentation for details on querying indexed data.

   Example for searching documents:

   ```python
   # After you have saved your index locally, you can load it later
   from langchain_openai import OpenAIEmbeddings
   embeddings = OpenAIEmbeddings() 
   FAISS.load_local('path/to/your/final/index', embeddings, allow_dangerous_deserialization=True)

   # Initialize your vector store as a retriever to retrieve only the first 10 documents that are most relevant to the query
   retriever = db.as_retriever(search_kwargs={"k":10})

   # Retrieve based on query string
   query = "What is the meaning of life?"
   result = retriever.invoke(query)  # returns a list of documents
   ```

## 🤝 Contributing

Contributions are welcome! Feel free to open an issue or pull request if you have suggestions or improvements.

## 📜 License

Miner AI Beta is licensed under the MIT License. See the [MIT](LICENSE) file for more information.

## Acknowledgements

- We would like to thank all the contributors to the project and the open-source community for their support.
- Miner AI Beta is meant to be used for ai data mining over documents and research purposes only. We are not responsible for any misuse of the library.
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Valerio357/miner-ai",
    "name": "miner-ai-beta",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4.0,>=3.11",
    "maintainer_email": null,
    "keywords": "langchain, documents_mining, web_scraping",
    "author": "Valerio Domenici",
    "author_email": "valeriodomenici93@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/fa/eb/b2a1d2b567f97ed8afce10d473c267e3cf3d29ea9398e8249dd08ebedd7e/miner_ai_beta-0.1.21.tar.gz",
    "platform": null,
    "description": "# Miner AI Beta \u26cf\ufe0f -- Library under construction\n\n<p align=\"center\">\n  <img src=\"images\\logo\\MINER-AI.png\" alt=\"Scrapegraph-ai Logo\" style=\"width: 35%;\">\n</p>\n\nMiner AI Beta represents a groundbreaking endeavor, specially crafted to compile an assortment of documents and web pages into a searchable database, all achieved offline. Utilizing cutting-edge language models along with sophisticated indexing strategies, Miner AI Beta is uniquely positioned to streamline the retrieval of information, ensuring speed, efficiency, and superior relevance without the need for an online connection.\n\n\n## \ud83d\udcaa Features\n\n- Indexing support for PDFs, PowerPoint presentations, Excel spreadsheets, web pages, and YouTube video transcripts.\n- Utilizes powerful embeddings and vector storage mechanisms to create efficient search indexes.\n- Merge functionality to combine multiple indexes for comprehensive search capabilities.\n- Designed with modularity in mind, allowing for easy extension to support additional document types.\n\n## \ud83d\udcbb Installation\n\nMiner AI Beta requires Python 3.12 or later. It is recommended to use a virtual environment to manage the project dependencies.\n\nTo install Miner AI Beta and its dependencies, follow these steps:\n\n```bash\n# Install the library from pipy\npip install miner-ai-beta==<last-code-version>\n```\n\n## \u2692\ufe0f Usage\n\n1. **Indexing Documents**\n\n   To start indexing your documents, you need to prepare your documents in the supported formats (PDF, PPTX, XLSX, web pages, YouTube videos).\n\n   Example for indexing PDFs:\n\n   ```python\n   # Initialize your embeddings model\n   from langchain_openai import OpenAIEmbeddings\n\n   embeddings = OpenAIEmbeddings() \n\n   # Initialize your vector store (FAISS, ChromaDB, etc.)\n   from langchain.vectorstores.faiss import FAISS\n\n   vectorstore = FAISS \n\n   # Index your documents inside the vectorstore\n   from miner_ai_beta.loader import IndexFromPdfs\n   from miner_ai_beta.loader import IndexFromDocs\n   from miner_ai_beta.loader import IndexFromXlss\n\n   folder_path = 'path/to/your/pdfs'\n\n   pdf = IndexFromPdfs(folder_path, embeddings, vectorstore)\n   doc = IndexFromDocs(folder_path, embeddings, vectorstore)\n   excel = IndexFromXlss(folder_path, embeddings, vectorstore)\n\n   # Merge your indexes\n   from miner_ai_beta.loader import MergeIndexes\n\n   final_index = MergeIndexes([pdf, doc, excel])\n\n   # Save your index locally for later use\n   final_index.save_local('path/to/your/final/index')\n   ```\n\n\n2. **Searching**\n\n   To search within your index, you will need to implement a search mechanism that leverages the created indexes.\n\n   Please refer to `vectorstore` documentation for details on querying indexed data.\n\n   Example for searching documents:\n\n   ```python\n   # After you have saved your index locally, you can load it later\n   from langchain_openai import OpenAIEmbeddings\n   embeddings = OpenAIEmbeddings() \n   FAISS.load_local('path/to/your/final/index', embeddings, allow_dangerous_deserialization=True)\n\n   # Initialize your vector store as a retriever to retrieve only the first 10 documents that are most relevant to the query\n   retriever = db.as_retriever(search_kwargs={\"k\":10})\n\n   # Retrieve based on query string\n   query = \"What is the meaning of life?\"\n   result = retriever.invoke(query)  # returns a list of documents\n   ```\n\n## \ud83e\udd1d Contributing\n\nContributions are welcome! Feel free to open an issue or pull request if you have suggestions or improvements.\n\n## \ud83d\udcdc License\n\nMiner AI Beta is licensed under the MIT License. See the [MIT](LICENSE) file for more information.\n\n## Acknowledgements\n\n- We would like to thank all the contributors to the project and the open-source community for their support.\n- Miner AI Beta is meant to be used for ai data mining over documents and research purposes only. We are not responsible for any misuse of the library.",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": null,
    "version": "0.1.21",
    "project_urls": {
        "Homepage": "https://github.com/Valerio357/miner-ai",
        "Repository": "https://github.com/Valerio357/miner-ai"
    },
    "split_keywords": [
        "langchain",
        " documents_mining",
        " web_scraping"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0ddd896f8111b0d454e5afdbfb51a0074bedbeed7b4a1bed201211c072c68ed8",
                "md5": "cdbce691c3585257200aaf5f922e1f92",
                "sha256": "d6f43662ff44de7a8dd35aac52381a107083e5c06d67c2d8decb8df8be8571f6"
            },
            "downloads": -1,
            "filename": "miner_ai_beta-0.1.21-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cdbce691c3585257200aaf5f922e1f92",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.11",
            "size": 16642,
            "upload_time": "2024-05-09T09:59:14",
            "upload_time_iso_8601": "2024-05-09T09:59:14.646811Z",
            "url": "https://files.pythonhosted.org/packages/0d/dd/896f8111b0d454e5afdbfb51a0074bedbeed7b4a1bed201211c072c68ed8/miner_ai_beta-0.1.21-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "faebb2a1d2b567f97ed8afce10d473c267e3cf3d29ea9398e8249dd08ebedd7e",
                "md5": "6ef12d54d6119983ea3f3168b281b76a",
                "sha256": "e025a2b88922148cb09594549f5835d69c53a03f0da7ce7a57a888e5857aa0fe"
            },
            "downloads": -1,
            "filename": "miner_ai_beta-0.1.21.tar.gz",
            "has_sig": false,
            "md5_digest": "6ef12d54d6119983ea3f3168b281b76a",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.11",
            "size": 27647,
            "upload_time": "2024-05-09T09:59:15",
            "upload_time_iso_8601": "2024-05-09T09:59:15.984892Z",
            "url": "https://files.pythonhosted.org/packages/fa/eb/b2a1d2b567f97ed8afce10d473c267e3cf3d29ea9398e8249dd08ebedd7e/miner_ai_beta-0.1.21.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-05-09 09:59:15",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Valerio357",
    "github_project": "miner-ai",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "miner-ai-beta"
}
        
Elapsed time: 0.25370s