llama-index-readers-microsoft-onedrive


Namellama-index-readers-microsoft-onedrive JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
Summaryllama-index readers microsoft_onedrive integration
upload_time2024-10-26 16:26:58
maintainergodwin3737
docs_urlNone
authorYour Name
requires_python<4.0,>=3.8.1
licenseMIT
keywords microsoft 365 microsoft onedrive microsoft365 onedrive for business onedrive personal onedrive
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Microsoft OneDrive Loader

```bash
pip install llama-index-readers-microsoft-onedrive
```

This loader reads files from:

- Microsoft OneDrive Personal [(https://onedrive.live.com/)](https://onedrive.live.com/) and
- Microsoft OneDrive for Business [(https://portal.office.com/onedrive)](https://portal.office.com/onedrive).

It supports recursively traversing and downloading files from subfolders and provides capability to download only files with specific mime types. To use this loader, you need to pass in a list of file/folder id or file/folder paths.

#### Subfolder traversing (enabled by default)

To disable: `loader.load_data(recursive = False)`

#### Mime types

You can also filter the files by the mimeType e.g.: `mime_types=["application/vnd.openxmlformats-officedocument.wordprocessingml.document"]`

### Authenticaton

OneDriveReader supports following two **MSAL authentication**:

#### 1. User Authentication: Browser based authentication:

- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)
- For interactive authentication to work, a browser is used to authenticate, hence the registered application should have a **redirect URI** set to _'https://localhost'_ under mobile and native applications.
- This mode of authentication is not suitable for CI/CD or other background service scenarios where manual authentication isn't feasible.
- API Permission required for registered app:
  > Microsoft Graph --> Delegated Permission -- > Files.Read.All

#### 2. App Authentication: Client ID & Client Secret based authentication

- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)
- For silent authentication to work, You need to create a client secret as well for the app.
- This mode of authnetication is not supported by Microsoft currently for OneDrive Personal, hence this can be used only for OneDrive for Business(Microsoft 365).
- API Permission required for registered app:

  > Microsoft Graph --> Application Permissions -- > Files.Read.All (**Grant Admin Consent**)

  > Microsoft Graph --> Application Permissions -- > User.Read.All (**Grant Admin Consent**)

## Usage

### OneDrive Personal

https://onedrive.live.com/

> Note: If you trying to connect to OneDrive Personal you can initialize OneDriveReader with just your client*id and interactive login. Microsoft \_doesn't* support App authentication for OneDrive Personal currently.

#### folder_id

You can extract a folder_id directly from its URL.

For example, the folder_id of `https://onedrive.live.com/?id=B5AF52B769DFDE4%216107&cid=0B5AF52B769DFDdRE4` is `B5AF52B769DFDE4%216107`.

#### file_id

You can extract a file_id directly from its preview URL.

For example, the file_id of `https://onedrive.live.com/?cid=0B5AF52BE769DFDE4&id=B5AF52B769DFDE4%216106&parId=root&o=OneUp` is `B5AF52B769DFDE4%216106`.

#### OneDrive Personal Example Usage:

```python
from llama_index.readers.microsoft_onedrive import OneDriveReader

# User Authentication flow: Replace client id with your own id
loader = OneDriveReader(client_id="82ee706e-2439-47fa-877a-95048ead9318")

# APP Authentication flow: NOT SUPPORTED By Microsoft

#### Get all documents including subfolders.
documents = loader.load_data()

#### Get documents using folder_id , to exclude traversing subfolders explicitly set the recursive flag to False, default is True
documents = loader.load_data(folder_id="folderid", recursive=False)

#### Using file ids
documents = loader.load_data(file_ids=["fileid1", "fileid2"])
```

### OneDrive For Business

https://portal.office.com/onedrive

> Note: If you are an organization trying to connect to OneDrive for Business (Part of Microsoft 365), you need to:

1. Initialize OneDriveReader with correct **tenant_id**, along with a client_id and client_Secret registered for the tenant.
2. Invoke the load_data method with **userprincipalname** (org provided email in most cases)

#### folder_path

The relative pathof subfolder from the root folder(Documents).

For example:

- The path of 1st level subfolder with name "drice co" (within root folder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co**

- The path of 2nd level subfolder "test" (within drice co subfolder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co/test**

#### file_path

The relatve path of files from the root folder(Documents).

For example, the path of file "demo_doc.docx" within test subfolder from previous example with url of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test/demo_doc.docx` is **drice%20co/test/demo_doc.docx**

#### OneDrive For Business Example Usage:

```python
from llama_index.readers.microsoft_onedrive import OneDriveReader

loader = OneDriveReader(
    client_id="82ee706e-2439-47fa-877a-95048ead9318",
    tenant_id="02ee706f-2439-47fa-877a-95048ead9318",
    client_secret="YOUR_SECRET",
)

#### Get all docx or pdf documents (subfolders included).
documents = loader.load_data(
    mime_types=[
        "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        "application/pdf",
    ],
    userprincipalname="godwin@foobar.onmicrosoft.com",
)

#### Get all documents from a folder of mentioned user's onedrive for business
documents = loader.load_data(
    folder_path="subfolder/subfolder2",
    userprincipalname="godwin@foobar.onmicrosoft.com",
)

#### Using file paths and userprincipalname(org provided email) of user
documents = loader.load_data(
    file_ids=[
        "subfolder/subfolder2/fileid1.pdf",
        "subfolder/subfolder3/fileid2.docx",
    ],
    userprincipalname="godwin@foobar.onmicrosoft.com",
)
```

#### Author

[Godwin Paul Vincent](https://github.com/godwin3737)

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-microsoft-onedrive",
    "maintainer": "godwin3737",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": "microsoft 365, microsoft onedrive, microsoft365, onedrive for business, onedrive personal, onedrive",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/ed/df/142fddd3861dcafe7d8b132e55ee2607b86c7a36e6c415c1336fd1cb59f4/llama_index_readers_microsoft_onedrive-0.2.1.tar.gz",
    "platform": null,
    "description": "# Microsoft OneDrive Loader\n\n```bash\npip install llama-index-readers-microsoft-onedrive\n```\n\nThis loader reads files from:\n\n- Microsoft OneDrive Personal [(https://onedrive.live.com/)](https://onedrive.live.com/) and\n- Microsoft OneDrive for Business [(https://portal.office.com/onedrive)](https://portal.office.com/onedrive).\n\nIt supports recursively traversing and downloading files from subfolders and provides capability to download only files with specific mime types. To use this loader, you need to pass in a list of file/folder id or file/folder paths.\n\n#### Subfolder traversing (enabled by default)\n\nTo disable: `loader.load_data(recursive = False)`\n\n#### Mime types\n\nYou can also filter the files by the mimeType e.g.: `mime_types=[\"application/vnd.openxmlformats-officedocument.wordprocessingml.document\"]`\n\n### Authenticaton\n\nOneDriveReader supports following two **MSAL authentication**:\n\n#### 1. User Authentication: Browser based authentication:\n\n- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)\n- For interactive authentication to work, a browser is used to authenticate, hence the registered application should have a **redirect URI** set to _'https://localhost'_ under mobile and native applications.\n- This mode of authentication is not suitable for CI/CD or other background service scenarios where manual authentication isn't feasible.\n- API Permission required for registered app:\n  > Microsoft Graph --> Delegated Permission -- > Files.Read.All\n\n#### 2. App Authentication: Client ID & Client Secret based authentication\n\n- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)\n- For silent authentication to work, You need to create a client secret as well for the app.\n- This mode of authnetication is not supported by Microsoft currently for OneDrive Personal, hence this can be used only for OneDrive for Business(Microsoft 365).\n- API Permission required for registered app:\n\n  > Microsoft Graph --> Application Permissions -- > Files.Read.All (**Grant Admin Consent**)\n\n  > Microsoft Graph --> Application Permissions -- > User.Read.All (**Grant Admin Consent**)\n\n## Usage\n\n### OneDrive Personal\n\nhttps://onedrive.live.com/\n\n> Note: If you trying to connect to OneDrive Personal you can initialize OneDriveReader with just your client*id and interactive login. Microsoft \\_doesn't* support App authentication for OneDrive Personal currently.\n\n#### folder_id\n\nYou can extract a folder_id directly from its URL.\n\nFor example, the folder_id of `https://onedrive.live.com/?id=B5AF52B769DFDE4%216107&cid=0B5AF52B769DFDdRE4` is `B5AF52B769DFDE4%216107`.\n\n#### file_id\n\nYou can extract a file_id directly from its preview URL.\n\nFor example, the file_id of `https://onedrive.live.com/?cid=0B5AF52BE769DFDE4&id=B5AF52B769DFDE4%216106&parId=root&o=OneUp` is `B5AF52B769DFDE4%216106`.\n\n#### OneDrive Personal Example Usage:\n\n```python\nfrom llama_index.readers.microsoft_onedrive import OneDriveReader\n\n# User Authentication flow: Replace client id with your own id\nloader = OneDriveReader(client_id=\"82ee706e-2439-47fa-877a-95048ead9318\")\n\n# APP Authentication flow: NOT SUPPORTED By Microsoft\n\n#### Get all documents including subfolders.\ndocuments = loader.load_data()\n\n#### Get documents using folder_id , to exclude traversing subfolders explicitly set the recursive flag to False, default is True\ndocuments = loader.load_data(folder_id=\"folderid\", recursive=False)\n\n#### Using file ids\ndocuments = loader.load_data(file_ids=[\"fileid1\", \"fileid2\"])\n```\n\n### OneDrive For Business\n\nhttps://portal.office.com/onedrive\n\n> Note: If you are an organization trying to connect to OneDrive for Business (Part of Microsoft 365), you need to:\n\n1. Initialize OneDriveReader with correct **tenant_id**, along with a client_id and client_Secret registered for the tenant.\n2. Invoke the load_data method with **userprincipalname** (org provided email in most cases)\n\n#### folder_path\n\nThe relative pathof subfolder from the root folder(Documents).\n\nFor example:\n\n- The path of 1st level subfolder with name \"drice co\" (within root folder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co**\n\n- The path of 2nd level subfolder \"test\" (within drice co subfolder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co/test**\n\n#### file_path\n\nThe relatve path of files from the root folder(Documents).\n\nFor example, the path of file \"demo_doc.docx\" within test subfolder from previous example with url of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test/demo_doc.docx` is **drice%20co/test/demo_doc.docx**\n\n#### OneDrive For Business Example Usage:\n\n```python\nfrom llama_index.readers.microsoft_onedrive import OneDriveReader\n\nloader = OneDriveReader(\n    client_id=\"82ee706e-2439-47fa-877a-95048ead9318\",\n    tenant_id=\"02ee706f-2439-47fa-877a-95048ead9318\",\n    client_secret=\"YOUR_SECRET\",\n)\n\n#### Get all docx or pdf documents (subfolders included).\ndocuments = loader.load_data(\n    mime_types=[\n        \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n        \"application/pdf\",\n    ],\n    userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n\n#### Get all documents from a folder of mentioned user's onedrive for business\ndocuments = loader.load_data(\n    folder_path=\"subfolder/subfolder2\",\n    userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n\n#### Using file paths and userprincipalname(org provided email) of user\ndocuments = loader.load_data(\n    file_ids=[\n        \"subfolder/subfolder2/fileid1.pdf\",\n        \"subfolder/subfolder3/fileid2.docx\",\n    ],\n    userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n```\n\n#### Author\n\n[Godwin Paul Vincent](https://github.com/godwin3737)\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers microsoft_onedrive integration",
    "version": "0.2.1",
    "project_urls": null,
    "split_keywords": [
        "microsoft 365",
        " microsoft onedrive",
        " microsoft365",
        " onedrive for business",
        " onedrive personal",
        " onedrive"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "8b65a94742e3773b522ca949c92f977ef2cf82149be746aa93decbb24dccbd86",
                "md5": "26d3e87c78cbf3772d336fba7986c32a",
                "sha256": "9c3730ff4525804c9d5678983f9f9bbbade510e98502280eb9c95afa5aa9ccad"
            },
            "downloads": -1,
            "filename": "llama_index_readers_microsoft_onedrive-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "26d3e87c78cbf3772d336fba7986c32a",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 10863,
            "upload_time": "2024-10-26T16:26:56",
            "upload_time_iso_8601": "2024-10-26T16:26:56.854809Z",
            "url": "https://files.pythonhosted.org/packages/8b/65/a94742e3773b522ca949c92f977ef2cf82149be746aa93decbb24dccbd86/llama_index_readers_microsoft_onedrive-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "eddf142fddd3861dcafe7d8b132e55ee2607b86c7a36e6c415c1336fd1cb59f4",
                "md5": "ae7532daa7d5f9ea2ba41df889647734",
                "sha256": "aef205582c0a908e76abe182563cc854c53b5e22b788aabfd6c22a219f024aa5"
            },
            "downloads": -1,
            "filename": "llama_index_readers_microsoft_onedrive-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "ae7532daa7d5f9ea2ba41df889647734",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 12092,
            "upload_time": "2024-10-26T16:26:58",
            "upload_time_iso_8601": "2024-10-26T16:26:58.395896Z",
            "url": "https://files.pythonhosted.org/packages/ed/df/142fddd3861dcafe7d8b132e55ee2607b86c7a36e6c415c1336fd1cb59f4/llama_index_readers_microsoft_onedrive-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-10-26 16:26:58",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-microsoft-onedrive"
}
        
Elapsed time: 0.37641s