llama-index-readers-remote-depth


Namellama-index-readers-remote-depth JSON
Version 0.1.4 PyPI version JSON
download
home_page
Summaryllama-index readers remote_depth integration
upload_time2024-02-21 21:26:52
maintainersimonMoisselin
docs_urlNone
authorYour Name
requires_python>=3.8.1,<3.12
licenseMIT
keywords hosted multiple url
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Remote Page/File Loader

This loader makes it easy to extract the text from the links available in a webpage URL, and extract the links presents in the page. It's based on `RemoteReader` (reading single page), that is based on `SimpleDirectoryReader` (parsing the document if file is a pdf, etc). It is an all-in-one tool for (almost) any group of urls.

You can try with this MIT lecture link, it will be able to extract the syllabus, the PDFs, etc:
`https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/`

## Usage

You need to specify the parameter `depth` to specify how many levels of links you want to extract. For example, if you want to extract the links in the page, and the links in the links in the page, you need to specify `depth=2`.

```python
from llama_index import download_loader

RemoteDepthReader = download_loader("RemoteDepthReader")

loader = RemoteDepthReader()
documents = loader.load_data(
    url="https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/"
)
```

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/tree/main/llama_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent. See [here](https://github.com/emptycrown/llama-hub/tree/main) for examples.

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "llama-index-readers-remote-depth",
    "maintainer": "simonMoisselin",
    "docs_url": null,
    "requires_python": ">=3.8.1,<3.12",
    "maintainer_email": "",
    "keywords": "hosted,multiple,url",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/55/b2/d7272f375dc44ebdc46290755caafc1c864b2a8ef11ecab2b4ccb34a46c5/llama_index_readers_remote_depth-0.1.4.tar.gz",
    "platform": null,
    "description": "# Remote Page/File Loader\n\nThis loader makes it easy to extract the text from the links available in a webpage URL, and extract the links presents in the page. It's based on `RemoteReader` (reading single page), that is based on `SimpleDirectoryReader` (parsing the document if file is a pdf, etc). It is an all-in-one tool for (almost) any group of urls.\n\nYou can try with this MIT lecture link, it will be able to extract the syllabus, the PDFs, etc:\n`https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/`\n\n## Usage\n\nYou need to specify the parameter `depth` to specify how many levels of links you want to extract. For example, if you want to extract the links in the page, and the links in the links in the page, you need to specify `depth=2`.\n\n```python\nfrom llama_index import download_loader\n\nRemoteDepthReader = download_loader(\"RemoteDepthReader\")\n\nloader = RemoteDepthReader()\ndocuments = loader.load_data(\n    url=\"https://ocw.mit.edu/courses/5-05-principles-of-inorganic-chemistry-iii-spring-2005/pages/syllabus/\"\n)\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/tree/main/llama_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent. See [here](https://github.com/emptycrown/llama-hub/tree/main) for examples.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers remote_depth integration",
    "version": "0.1.4",
    "project_urls": null,
    "split_keywords": [
        "hosted",
        "multiple",
        "url"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c451b78c22309c99273873e330018c7ca3a0092c29c26650870ffda95b0cce2a",
                "md5": "24d9c60585a7df83c85a0f8cbc4b0bec",
                "sha256": "cba7042fd996b703cf6f444dbe8ebf826ca868df8affa95bb4dc16ab236f3892"
            },
            "downloads": -1,
            "filename": "llama_index_readers_remote_depth-0.1.4-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "24d9c60585a7df83c85a0f8cbc4b0bec",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.8.1,<3.12",
            "size": 3565,
            "upload_time": "2024-02-21T21:26:51",
            "upload_time_iso_8601": "2024-02-21T21:26:51.345035Z",
            "url": "https://files.pythonhosted.org/packages/c4/51/b78c22309c99273873e330018c7ca3a0092c29c26650870ffda95b0cce2a/llama_index_readers_remote_depth-0.1.4-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "55b2d7272f375dc44ebdc46290755caafc1c864b2a8ef11ecab2b4ccb34a46c5",
                "md5": "47d986e9d6d16e869c3979a9e8f1a045",
                "sha256": "ae112c02374824b4608833721a530f60cd55461897207f446c03b07b7f10b43e"
            },
            "downloads": -1,
            "filename": "llama_index_readers_remote_depth-0.1.4.tar.gz",
            "has_sig": false,
            "md5_digest": "47d986e9d6d16e869c3979a9e8f1a045",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8.1,<3.12",
            "size": 3311,
            "upload_time": "2024-02-21T21:26:52",
            "upload_time_iso_8601": "2024-02-21T21:26:52.427235Z",
            "url": "https://files.pythonhosted.org/packages/55/b2/d7272f375dc44ebdc46290755caafc1c864b2a8ef11ecab2b4ccb34a46c5/llama_index_readers_remote_depth-0.1.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-02-21 21:26:52",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-remote-depth"
}
        
Elapsed time: 0.20907s