llama-index-readers-microsoft-onedrive


Namellama-index-readers-microsoft-onedrive JSON
Version 0.3.0 PyPI version JSON
download
home_pageNone
Summaryllama-index readers microsoft_onedrive integration
upload_time2024-11-18 00:17:10
maintainergodwin3737
docs_urlNone
authorYour Name
requires_python<4.0,>=3.9
licenseMIT
keywords microsoft 365 microsoft onedrive microsoft365 onedrive for business onedrive personal onedrive
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Microsoft OneDrive Loader

```bash
pip install llama-index-readers-microsoft-onedrive
```

This loader reads files from:

- Microsoft OneDrive Personal [(https://onedrive.live.com/)](https://onedrive.live.com/) and
- Microsoft OneDrive for Business [(https://portal.office.com/onedrive)](https://portal.office.com/onedrive).

It supports recursively traversing and downloading files from subfolders and provides capability to download only files with specific mime types. To use this loader, you need to pass in a list of file/folder id or file/folder paths.

#### Subfolder traversing (enabled by default)

To disable: `loader.load_data(recursive = False)`

#### Mime types

You can also filter the files by the mimeType e.g.: `mime_types=["application/vnd.openxmlformats-officedocument.wordprocessingml.document"]`

### Authenticaton

OneDriveReader supports following two **MSAL authentication**:

#### 1. User Authentication: Browser based authentication:

- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)
- For interactive authentication to work, a browser is used to authenticate, hence the registered application should have a **redirect URI** set to _'https://localhost'_ under mobile and native applications.
- This mode of authentication is not suitable for CI/CD or other background service scenarios where manual authentication isn't feasible.
- API Permission required for registered app:
  > Microsoft Graph --> Delegated Permission -- > Files.Read.All

#### 2. App Authentication: Client ID & Client Secret based authentication

- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)
- For silent authentication to work, You need to create a client secret as well for the app.
- This mode of authnetication is not supported by Microsoft currently for OneDrive Personal, hence this can be used only for OneDrive for Business(Microsoft 365).
- API Permission required for registered app:

  > Microsoft Graph --> Application Permissions -- > Files.Read.All (**Grant Admin Consent**)

  > Microsoft Graph --> Application Permissions -- > User.Read.All (**Grant Admin Consent**)

## Usage

### OneDrive Personal

https://onedrive.live.com/

> Note: If you trying to connect to OneDrive Personal you can initialize OneDriveReader with just your client*id and interactive login. Microsoft \_doesn't* support App authentication for OneDrive Personal currently.

#### folder_id

You can extract a folder_id directly from its URL.

For example, the folder_id of `https://onedrive.live.com/?id=B5AF52B769DFDE4%216107&cid=0B5AF52B769DFDdRE4` is `B5AF52B769DFDE4%216107`.

#### file_id

You can extract a file_id directly from its preview URL.

For example, the file_id of `https://onedrive.live.com/?cid=0B5AF52BE769DFDE4&id=B5AF52B769DFDE4%216106&parId=root&o=OneUp` is `B5AF52B769DFDE4%216106`.

#### OneDrive Personal Example Usage:

```python
from llama_index.readers.microsoft_onedrive import OneDriveReader

# User Authentication flow: Replace client id with your own id
loader = OneDriveReader(client_id="82ee706e-2439-47fa-877a-95048ead9318")

# APP Authentication flow: NOT SUPPORTED By Microsoft

#### Get all documents including subfolders.
documents = loader.load_data()

#### Get documents using folder_id , to exclude traversing subfolders explicitly set the recursive flag to False, default is True
documents = loader.load_data(folder_id="folderid", recursive=False)

#### Using file ids
documents = loader.load_data(file_ids=["fileid1", "fileid2"])
```

### OneDrive For Business

https://portal.office.com/onedrive

> Note: If you are an organization trying to connect to OneDrive for Business (Part of Microsoft 365), you need to:

1. Initialize OneDriveReader with correct **tenant_id**, along with a client_id and client_Secret registered for the tenant.
2. Invoke the load_data method with **userprincipalname** (org provided email in most cases)

#### folder_path

The relative pathof subfolder from the root folder(Documents).

For example:

- The path of 1st level subfolder with name "drice co" (within root folder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co**

- The path of 2nd level subfolder "test" (within drice co subfolder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co/test**

#### file_path

The relatve path of files from the root folder(Documents).

For example, the path of file "demo_doc.docx" within test subfolder from previous example with url of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test/demo_doc.docx` is **drice%20co/test/demo_doc.docx**

#### OneDrive For Business Example Usage:

```python
from llama_index.readers.microsoft_onedrive import OneDriveReader

loader = OneDriveReader(
    client_id="82ee706e-2439-47fa-877a-95048ead9318",
    tenant_id="02ee706f-2439-47fa-877a-95048ead9318",
    client_secret="YOUR_SECRET",
)

#### Get all docx or pdf documents (subfolders included).
documents = loader.load_data(
    mime_types=[
        "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
        "application/pdf",
    ],
    userprincipalname="godwin@foobar.onmicrosoft.com",
)

#### Get all documents from a folder of mentioned user's onedrive for business
documents = loader.load_data(
    folder_path="subfolder/subfolder2",
    userprincipalname="godwin@foobar.onmicrosoft.com",
)

#### Using file paths and userprincipalname(org provided email) of user
documents = loader.load_data(
    file_ids=[
        "subfolder/subfolder2/fileid1.pdf",
        "subfolder/subfolder3/fileid2.docx",
    ],
    userprincipalname="godwin@foobar.onmicrosoft.com",
)
```

#### Author

[Godwin Paul Vincent](https://github.com/godwin3737)

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-microsoft-onedrive",
    "maintainer": "godwin3737",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": "microsoft 365, microsoft onedrive, microsoft365, onedrive for business, onedrive personal, onedrive",
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/a6/a4/8bb03f520ad261e2ab0188ab902cc0b1d092a5579275179f67637ee8d191/llama_index_readers_microsoft_onedrive-0.3.0.tar.gz",
    "platform": null,
    "description": "# Microsoft OneDrive Loader\n\n```bash\npip install llama-index-readers-microsoft-onedrive\n```\n\nThis loader reads files from:\n\n- Microsoft OneDrive Personal [(https://onedrive.live.com/)](https://onedrive.live.com/) and\n- Microsoft OneDrive for Business [(https://portal.office.com/onedrive)](https://portal.office.com/onedrive).\n\nIt supports recursively traversing and downloading files from subfolders and provides capability to download only files with specific mime types. To use this loader, you need to pass in a list of file/folder id or file/folder paths.\n\n#### Subfolder traversing (enabled by default)\n\nTo disable: `loader.load_data(recursive = False)`\n\n#### Mime types\n\nYou can also filter the files by the mimeType e.g.: `mime_types=[\"application/vnd.openxmlformats-officedocument.wordprocessingml.document\"]`\n\n### Authenticaton\n\nOneDriveReader supports following two **MSAL authentication**:\n\n#### 1. User Authentication: Browser based authentication:\n\n- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)\n- For interactive authentication to work, a browser is used to authenticate, hence the registered application should have a **redirect URI** set to _'https://localhost'_ under mobile and native applications.\n- This mode of authentication is not suitable for CI/CD or other background service scenarios where manual authentication isn't feasible.\n- API Permission required for registered app:\n  > Microsoft Graph --> Delegated Permission -- > Files.Read.All\n\n#### 2. App Authentication: Client ID & Client Secret based authentication\n\n- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)\n- For silent authentication to work, You need to create a client secret as well for the app.\n- This mode of authnetication is not supported by Microsoft currently for OneDrive Personal, hence this can be used only for OneDrive for Business(Microsoft 365).\n- API Permission required for registered app:\n\n  > Microsoft Graph --> Application Permissions -- > Files.Read.All (**Grant Admin Consent**)\n\n  > Microsoft Graph --> Application Permissions -- > User.Read.All (**Grant Admin Consent**)\n\n## Usage\n\n### OneDrive Personal\n\nhttps://onedrive.live.com/\n\n> Note: If you trying to connect to OneDrive Personal you can initialize OneDriveReader with just your client*id and interactive login. Microsoft \\_doesn't* support App authentication for OneDrive Personal currently.\n\n#### folder_id\n\nYou can extract a folder_id directly from its URL.\n\nFor example, the folder_id of `https://onedrive.live.com/?id=B5AF52B769DFDE4%216107&cid=0B5AF52B769DFDdRE4` is `B5AF52B769DFDE4%216107`.\n\n#### file_id\n\nYou can extract a file_id directly from its preview URL.\n\nFor example, the file_id of `https://onedrive.live.com/?cid=0B5AF52BE769DFDE4&id=B5AF52B769DFDE4%216106&parId=root&o=OneUp` is `B5AF52B769DFDE4%216106`.\n\n#### OneDrive Personal Example Usage:\n\n```python\nfrom llama_index.readers.microsoft_onedrive import OneDriveReader\n\n# User Authentication flow: Replace client id with your own id\nloader = OneDriveReader(client_id=\"82ee706e-2439-47fa-877a-95048ead9318\")\n\n# APP Authentication flow: NOT SUPPORTED By Microsoft\n\n#### Get all documents including subfolders.\ndocuments = loader.load_data()\n\n#### Get documents using folder_id , to exclude traversing subfolders explicitly set the recursive flag to False, default is True\ndocuments = loader.load_data(folder_id=\"folderid\", recursive=False)\n\n#### Using file ids\ndocuments = loader.load_data(file_ids=[\"fileid1\", \"fileid2\"])\n```\n\n### OneDrive For Business\n\nhttps://portal.office.com/onedrive\n\n> Note: If you are an organization trying to connect to OneDrive for Business (Part of Microsoft 365), you need to:\n\n1. Initialize OneDriveReader with correct **tenant_id**, along with a client_id and client_Secret registered for the tenant.\n2. Invoke the load_data method with **userprincipalname** (org provided email in most cases)\n\n#### folder_path\n\nThe relative pathof subfolder from the root folder(Documents).\n\nFor example:\n\n- The path of 1st level subfolder with name \"drice co\" (within root folder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co**\n\n- The path of 2nd level subfolder \"test\" (within drice co subfolder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co/test**\n\n#### file_path\n\nThe relatve path of files from the root folder(Documents).\n\nFor example, the path of file \"demo_doc.docx\" within test subfolder from previous example with url of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test/demo_doc.docx` is **drice%20co/test/demo_doc.docx**\n\n#### OneDrive For Business Example Usage:\n\n```python\nfrom llama_index.readers.microsoft_onedrive import OneDriveReader\n\nloader = OneDriveReader(\n    client_id=\"82ee706e-2439-47fa-877a-95048ead9318\",\n    tenant_id=\"02ee706f-2439-47fa-877a-95048ead9318\",\n    client_secret=\"YOUR_SECRET\",\n)\n\n#### Get all docx or pdf documents (subfolders included).\ndocuments = loader.load_data(\n    mime_types=[\n        \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n        \"application/pdf\",\n    ],\n    userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n\n#### Get all documents from a folder of mentioned user's onedrive for business\ndocuments = loader.load_data(\n    folder_path=\"subfolder/subfolder2\",\n    userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n\n#### Using file paths and userprincipalname(org provided email) of user\ndocuments = loader.load_data(\n    file_ids=[\n        \"subfolder/subfolder2/fileid1.pdf\",\n        \"subfolder/subfolder3/fileid2.docx\",\n    ],\n    userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n```\n\n#### Author\n\n[Godwin Paul Vincent](https://github.com/godwin3737)\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers microsoft_onedrive integration",
    "version": "0.3.0",
    "project_urls": null,
    "split_keywords": [
        "microsoft 365",
        " microsoft onedrive",
        " microsoft365",
        " onedrive for business",
        " onedrive personal",
        " onedrive"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "de1f10241ac31e8cdd0b384e917c88ec16367fc866e91273b28ad796bb24ca38",
                "md5": "b8153a46cd179a3f0a48fb734e12d8c0",
                "sha256": "f436dd608e42970618385c00bf075ac476eb3768e38647180855fc1e2635aa15"
            },
            "downloads": -1,
            "filename": "llama_index_readers_microsoft_onedrive-0.3.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b8153a46cd179a3f0a48fb734e12d8c0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 10941,
            "upload_time": "2024-11-18T00:17:09",
            "upload_time_iso_8601": "2024-11-18T00:17:09.295714Z",
            "url": "https://files.pythonhosted.org/packages/de/1f/10241ac31e8cdd0b384e917c88ec16367fc866e91273b28ad796bb24ca38/llama_index_readers_microsoft_onedrive-0.3.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6a48bb03f520ad261e2ab0188ab902cc0b1d092a5579275179f67637ee8d191",
                "md5": "cc35e40bbb736c79c7bf497139e5a312",
                "sha256": "9340dee2b93ac7682b2e26a75834941b0f7e580504e64583e66bcf7bd1ebf1af"
            },
            "downloads": -1,
            "filename": "llama_index_readers_microsoft_onedrive-0.3.0.tar.gz",
            "has_sig": false,
            "md5_digest": "cc35e40bbb736c79c7bf497139e5a312",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 12218,
            "upload_time": "2024-11-18T00:17:10",
            "upload_time_iso_8601": "2024-11-18T00:17:10.977641Z",
            "url": "https://files.pythonhosted.org/packages/a6/a4/8bb03f520ad261e2ab0188ab902cc0b1d092a5579275179f67637ee8d191/llama_index_readers_microsoft_onedrive-0.3.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-11-18 00:17:10",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-microsoft-onedrive"
}
        
Elapsed time: 0.46335s