# Microsoft OneDrive Loader
```bash
pip install llama-index-readers-microsoft-onedrive
```
This loader reads files from:
- Microsoft OneDrive Personal [(https://onedrive.live.com/)](https://onedrive.live.com/) and
- Microsoft OneDrive for Business [(https://portal.office.com/onedrive)](https://portal.office.com/onedrive).
It supports recursively traversing and downloading files from subfolders and provides capability to download only files with specific mime types. To use this loader, you need to pass in a list of file/folder id or file/folder paths.
#### Subfolder traversing (enabled by default)
To disable: `loader.load_data(recursive = False)`
#### Mime types
You can also filter the files by the mimeType e.g.: `mime_types=["application/vnd.openxmlformats-officedocument.wordprocessingml.document"]`
### Authenticaton
OneDriveReader supports following two **MSAL authentication**:
#### 1. User Authentication: Browser based authentication:
- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)
- For interactive authentication to work, a browser is used to authenticate, hence the registered application should have a **redirect URI** set to _'https://localhost'_ under mobile and native applications.
- This mode of authentication is not suitable for CI/CD or other background service scenarios where manual authentication isn't feasible.
- API Permission required for registered app:
> Microsoft Graph --> Delegated Permission -- > Files.Read.All
#### 2. App Authentication: Client ID & Client Secret based authentication
- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)
- For silent authentication to work, You need to create a client secret as well for the app.
- This mode of authnetication is not supported by Microsoft currently for OneDrive Personal, hence this can be used only for OneDrive for Business(Microsoft 365).
- API Permission required for registered app:
> Microsoft Graph --> Application Permissions -- > Files.Read.All (**Grant Admin Consent**)
> Microsoft Graph --> Application Permissions -- > User.Read.All (**Grant Admin Consent**)
## Usage
### OneDrive Personal
https://onedrive.live.com/
> Note: If you trying to connect to OneDrive Personal you can initialize OneDriveReader with just your client*id and interactive login. Microsoft \_doesn't* support App authentication for OneDrive Personal currently.
#### folder_id
You can extract a folder_id directly from its URL.
For example, the folder_id of `https://onedrive.live.com/?id=B5AF52B769DFDE4%216107&cid=0B5AF52B769DFDdRE4` is `B5AF52B769DFDE4%216107`.
#### file_id
You can extract a file_id directly from its preview URL.
For example, the file_id of `https://onedrive.live.com/?cid=0B5AF52BE769DFDE4&id=B5AF52B769DFDE4%216106&parId=root&o=OneUp` is `B5AF52B769DFDE4%216106`.
#### OneDrive Personal Example Usage:
```python
from llama_index.readers.microsoft_onedrive import OneDriveReader
# User Authentication flow: Replace client id with your own id
loader = OneDriveReader(client_id="82ee706e-2439-47fa-877a-95048ead9318")
# APP Authentication flow: NOT SUPPORTED By Microsoft
#### Get all documents including subfolders.
documents = loader.load_data()
#### Get documents using folder_id , to exclude traversing subfolders explicitly set the recursive flag to False, default is True
documents = loader.load_data(folder_id="folderid", recursive=False)
#### Using file ids
documents = loader.load_data(file_ids=["fileid1", "fileid2"])
```
### OneDrive For Business
https://portal.office.com/onedrive
> Note: If you are an organization trying to connect to OneDrive for Business (Part of Microsoft 365), you need to:
1. Initialize OneDriveReader with correct **tenant_id**, along with a client_id and client_Secret registered for the tenant.
2. Invoke the load_data method with **userprincipalname** (org provided email in most cases)
#### folder_path
The relative pathof subfolder from the root folder(Documents).
For example:
- The path of 1st level subfolder with name "drice co" (within root folder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co**
- The path of 2nd level subfolder "test" (within drice co subfolder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co/test**
#### file_path
The relatve path of files from the root folder(Documents).
For example, the path of file "demo_doc.docx" within test subfolder from previous example with url of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test/demo_doc.docx` is **drice%20co/test/demo_doc.docx**
#### OneDrive For Business Example Usage:
```python
from llama_index.readers.microsoft_onedrive import OneDriveReader
loader = OneDriveReader(
client_id="82ee706e-2439-47fa-877a-95048ead9318",
tenant_id="02ee706f-2439-47fa-877a-95048ead9318",
client_secret="YOUR_SECRET",
)
#### Get all docx or pdf documents (subfolders included).
documents = loader.load_data(
mime_types=[
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/pdf",
],
userprincipalname="godwin@foobar.onmicrosoft.com",
)
#### Get all documents from a folder of mentioned user's onedrive for business
documents = loader.load_data(
folder_path="subfolder/subfolder2",
userprincipalname="godwin@foobar.onmicrosoft.com",
)
#### Using file paths and userprincipalname(org provided email) of user
documents = loader.load_data(
file_ids=[
"subfolder/subfolder2/fileid1.pdf",
"subfolder/subfolder3/fileid2.docx",
],
userprincipalname="godwin@foobar.onmicrosoft.com",
)
```
#### Author
[Godwin Paul Vincent](https://github.com/godwin3737)
This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).
Raw data
{
"_id": null,
"home_page": null,
"name": "llama-index-readers-microsoft-onedrive",
"maintainer": "godwin3737",
"docs_url": null,
"requires_python": "<4.0,>=3.9",
"maintainer_email": null,
"keywords": "microsoft 365, microsoft onedrive, microsoft365, onedrive for business, onedrive personal, onedrive",
"author": "Your Name",
"author_email": "you@example.com",
"download_url": "https://files.pythonhosted.org/packages/a6/a4/8bb03f520ad261e2ab0188ab902cc0b1d092a5579275179f67637ee8d191/llama_index_readers_microsoft_onedrive-0.3.0.tar.gz",
"platform": null,
"description": "# Microsoft OneDrive Loader\n\n```bash\npip install llama-index-readers-microsoft-onedrive\n```\n\nThis loader reads files from:\n\n- Microsoft OneDrive Personal [(https://onedrive.live.com/)](https://onedrive.live.com/) and\n- Microsoft OneDrive for Business [(https://portal.office.com/onedrive)](https://portal.office.com/onedrive).\n\nIt supports recursively traversing and downloading files from subfolders and provides capability to download only files with specific mime types. To use this loader, you need to pass in a list of file/folder id or file/folder paths.\n\n#### Subfolder traversing (enabled by default)\n\nTo disable: `loader.load_data(recursive = False)`\n\n#### Mime types\n\nYou can also filter the files by the mimeType e.g.: `mime_types=[\"application/vnd.openxmlformats-officedocument.wordprocessingml.document\"]`\n\n### Authenticaton\n\nOneDriveReader supports following two **MSAL authentication**:\n\n#### 1. User Authentication: Browser based authentication:\n\n- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)\n- For interactive authentication to work, a browser is used to authenticate, hence the registered application should have a **redirect URI** set to _'https://localhost'_ under mobile and native applications.\n- This mode of authentication is not suitable for CI/CD or other background service scenarios where manual authentication isn't feasible.\n- API Permission required for registered app:\n > Microsoft Graph --> Delegated Permission -- > Files.Read.All\n\n#### 2. App Authentication: Client ID & Client Secret based authentication\n\n- You need to create a app registration in Microsoft Entra (formerly Azure Active Directory)\n- For silent authentication to work, You need to create a client secret as well for the app.\n- This mode of authnetication is not supported by Microsoft currently for OneDrive Personal, hence this can be used only for OneDrive for Business(Microsoft 365).\n- API Permission required for registered app:\n\n > Microsoft Graph --> Application Permissions -- > Files.Read.All (**Grant Admin Consent**)\n\n > Microsoft Graph --> Application Permissions -- > User.Read.All (**Grant Admin Consent**)\n\n## Usage\n\n### OneDrive Personal\n\nhttps://onedrive.live.com/\n\n> Note: If you trying to connect to OneDrive Personal you can initialize OneDriveReader with just your client*id and interactive login. Microsoft \\_doesn't* support App authentication for OneDrive Personal currently.\n\n#### folder_id\n\nYou can extract a folder_id directly from its URL.\n\nFor example, the folder_id of `https://onedrive.live.com/?id=B5AF52B769DFDE4%216107&cid=0B5AF52B769DFDdRE4` is `B5AF52B769DFDE4%216107`.\n\n#### file_id\n\nYou can extract a file_id directly from its preview URL.\n\nFor example, the file_id of `https://onedrive.live.com/?cid=0B5AF52BE769DFDE4&id=B5AF52B769DFDE4%216106&parId=root&o=OneUp` is `B5AF52B769DFDE4%216106`.\n\n#### OneDrive Personal Example Usage:\n\n```python\nfrom llama_index.readers.microsoft_onedrive import OneDriveReader\n\n# User Authentication flow: Replace client id with your own id\nloader = OneDriveReader(client_id=\"82ee706e-2439-47fa-877a-95048ead9318\")\n\n# APP Authentication flow: NOT SUPPORTED By Microsoft\n\n#### Get all documents including subfolders.\ndocuments = loader.load_data()\n\n#### Get documents using folder_id , to exclude traversing subfolders explicitly set the recursive flag to False, default is True\ndocuments = loader.load_data(folder_id=\"folderid\", recursive=False)\n\n#### Using file ids\ndocuments = loader.load_data(file_ids=[\"fileid1\", \"fileid2\"])\n```\n\n### OneDrive For Business\n\nhttps://portal.office.com/onedrive\n\n> Note: If you are an organization trying to connect to OneDrive for Business (Part of Microsoft 365), you need to:\n\n1. Initialize OneDriveReader with correct **tenant_id**, along with a client_id and client_Secret registered for the tenant.\n2. Invoke the load_data method with **userprincipalname** (org provided email in most cases)\n\n#### folder_path\n\nThe relative pathof subfolder from the root folder(Documents).\n\nFor example:\n\n- The path of 1st level subfolder with name \"drice co\" (within root folder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co**\n\n- The path of 2nd level subfolder \"test\" (within drice co subfolder) with URL of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test` is **drice%20co/test**\n\n#### file_path\n\nThe relatve path of files from the root folder(Documents).\n\nFor example, the path of file \"demo_doc.docx\" within test subfolder from previous example with url of `https://foobar-my.sharepoint.com/personal/godwin_foobar_onmicrosoft_com/_layouts/15/onedrive.aspx?id=/personal/godwin_foobar_onmicrosoft_com/Documents/drice%20co/test/demo_doc.docx` is **drice%20co/test/demo_doc.docx**\n\n#### OneDrive For Business Example Usage:\n\n```python\nfrom llama_index.readers.microsoft_onedrive import OneDriveReader\n\nloader = OneDriveReader(\n client_id=\"82ee706e-2439-47fa-877a-95048ead9318\",\n tenant_id=\"02ee706f-2439-47fa-877a-95048ead9318\",\n client_secret=\"YOUR_SECRET\",\n)\n\n#### Get all docx or pdf documents (subfolders included).\ndocuments = loader.load_data(\n mime_types=[\n \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n \"application/pdf\",\n ],\n userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n\n#### Get all documents from a folder of mentioned user's onedrive for business\ndocuments = loader.load_data(\n folder_path=\"subfolder/subfolder2\",\n userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n\n#### Using file paths and userprincipalname(org provided email) of user\ndocuments = loader.load_data(\n file_ids=[\n \"subfolder/subfolder2/fileid1.pdf\",\n \"subfolder/subfolder3/fileid2.docx\",\n ],\n userprincipalname=\"godwin@foobar.onmicrosoft.com\",\n)\n```\n\n#### Author\n\n[Godwin Paul Vincent](https://github.com/godwin3737)\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "llama-index readers microsoft_onedrive integration",
"version": "0.3.0",
"project_urls": null,
"split_keywords": [
"microsoft 365",
" microsoft onedrive",
" microsoft365",
" onedrive for business",
" onedrive personal",
" onedrive"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "de1f10241ac31e8cdd0b384e917c88ec16367fc866e91273b28ad796bb24ca38",
"md5": "b8153a46cd179a3f0a48fb734e12d8c0",
"sha256": "f436dd608e42970618385c00bf075ac476eb3768e38647180855fc1e2635aa15"
},
"downloads": -1,
"filename": "llama_index_readers_microsoft_onedrive-0.3.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b8153a46cd179a3f0a48fb734e12d8c0",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4.0,>=3.9",
"size": 10941,
"upload_time": "2024-11-18T00:17:09",
"upload_time_iso_8601": "2024-11-18T00:17:09.295714Z",
"url": "https://files.pythonhosted.org/packages/de/1f/10241ac31e8cdd0b384e917c88ec16367fc866e91273b28ad796bb24ca38/llama_index_readers_microsoft_onedrive-0.3.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a6a48bb03f520ad261e2ab0188ab902cc0b1d092a5579275179f67637ee8d191",
"md5": "cc35e40bbb736c79c7bf497139e5a312",
"sha256": "9340dee2b93ac7682b2e26a75834941b0f7e580504e64583e66bcf7bd1ebf1af"
},
"downloads": -1,
"filename": "llama_index_readers_microsoft_onedrive-0.3.0.tar.gz",
"has_sig": false,
"md5_digest": "cc35e40bbb736c79c7bf497139e5a312",
"packagetype": "sdist",
"python_version": "source",
"requires_python": "<4.0,>=3.9",
"size": 12218,
"upload_time": "2024-11-18T00:17:10",
"upload_time_iso_8601": "2024-11-18T00:17:10.977641Z",
"url": "https://files.pythonhosted.org/packages/a6/a4/8bb03f520ad261e2ab0188ab902cc0b1d092a5579275179f67637ee8d191/llama_index_readers_microsoft_onedrive-0.3.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-11-18 00:17:10",
"github": false,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"lcname": "llama-index-readers-microsoft-onedrive"
}