llama-index-readers-confluence


Namellama-index-readers-confluence JSON
Version 0.2.1 PyPI version JSON
download
home_pageNone
Summaryllama-index readers confluence integration
upload_time2024-08-28 15:01:24
maintainerzywilliamli
docs_urlNone
authorYour Name
requires_python<4.0,>=3.8.1
licenseMIT
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Confluence Loader

```bash
pip install llama-index-readers-confluence
```

This loader loads pages from a given Confluence cloud instance. The user needs to specify the base URL for a Confluence
instance to initialize the ConfluenceReader - base URL needs to end with `/wiki`.

The user can optionally specify OAuth 2.0 credentials to authenticate with the Confluence instance. If no credentials are
specified, the loader will look for `CONFLUENCE_API_TOKEN` or `CONFLUENCE_USERNAME`/`CONFLUENCE_PASSWORD` environment variables
to proceed with basic authentication.

> [!NOTE]
> Keep in mind `CONFLUENCE_PASSWORD` is not your actual password, but an API Token obtained here: https://id.atlassian.com/manage-profile/security/api-tokens.

The following order is used for checking authentication credentials:

1. `oauth2`
2. `api_token`
3. `user_name` and `password`
4. Environment variable `CONFLUENCE_API_TOKEN`
5. Environment variable `CONFLUENCE_USERNAME` and `CONFLUENCE_PASSWORD`

For more on authenticating using OAuth 2.0, checkout:

- https://atlassian-python-api.readthedocs.io/index.html
- https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/

Confluence pages are obtained through one of 4 four mutually exclusive ways:

1. `page_ids`: Load all pages from a list of page ids
2. `space_key`: Load all pages from a space
3. `label`: Load all pages with a given label
4. `cql`: Load all pages that match a given CQL query (Confluence Query Language https://developer.atlassian.com/cloud/confluence/advanced-searching-using-cql/ ).

When `page_ids` is specified, `include_children` will cause the loader to also load all descendent pages.
When `space_key` is specified, `page_status` further specifies the status of pages to load: None, 'current', 'archived', 'draft'.

limit (int): Deprecated, use `max_num_results` instead.

max_num_results (int): Maximum number of results to return. If None, return all results. Requests are made in batches to achieve the desired number of results.

start(int): Which offset we should jump to when getting pages, only works with space_key

cursor(str): An alternative to start for cql queries, the cursor is a pointer to the next "page" when searching atlassian products. The current one after a search can be found with `get_next_cursor()`

User can also specify a boolean `include_attachments` to
include attachments, this is set to `False` by default, if set to `True` all attachments will be downloaded and
ConfluenceReader will extract the text from the attachments and add it to the Document object.
Currently supported attachment types are: PDF, PNG, JPEG/JPG, SVG, Word and Excel.

Hint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>

## Usage

Here's an example usage of the ConfluenceReader.

```python
# Example that reads the pages with the `page_ids`
from llama_index.readers.confluence import ConfluenceReader

token = {"access_token": "<access_token>", "token_type": "<token_type>"}
oauth2_dict = {"client_id": "<client_id>", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

page_ids = ["<page_id_1>", "<page_id_2>", "<page_id_3"]
space_key = "<space_key>"

reader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)
documents = reader.load_data(
    space_key=space_key, include_attachments=True, page_status="current"
)
documents.extend(
    reader.load_data(
        page_ids=page_ids, include_children=True, include_attachments=True
    )
)
```

```python
# Example that fetches the first 5, then the next 5 pages from a space
from llama_index.readers.confluence import ConfluenceReader

token = {"access_token": "<access_token>", "token_type": "<token_type>"}
oauth2_dict = {"client_id": "<client_id>", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

space_key = "<space_key>"

reader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)
documents = reader.load_data(
    space_key=space_key,
    include_attachments=True,
    page_status="current",
    start=0,
    max_num_results=5,
)
documents.extend(
    reader.load_data(
        space_key=space_key,
        include_children=True,
        include_attachments=True,
        start=5,
        max_num_results=5,
    )
)
```

```python
# Example that fetches the first 5 results from a cql query, the uses the cursor to pick up on the next element
from llama_index.readers.confluence import ConfluenceReader

token = {"access_token": "<access_token>", "token_type": "<token_type>"}
oauth2_dict = {"client_id": "<client_id>", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

cql = f'type="page" AND label="devops"'

reader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)
documents = reader.load_data(cql=cql, max_num_results=5)
cursor = reader.get_next_cursor()
documents.extend(reader.load_data(cql=cql, cursor=cursor, max_num_results=5))
```

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-confluence",
    "maintainer": "zywilliamli",
    "docs_url": null,
    "requires_python": "<4.0,>=3.8.1",
    "maintainer_email": null,
    "keywords": null,
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/92/e5/d5e18f7792ca9d48797c9cf2032d8db0311129a6a2e86c8266bd84c351c4/llama_index_readers_confluence-0.2.1.tar.gz",
    "platform": null,
    "description": "# Confluence Loader\n\n```bash\npip install llama-index-readers-confluence\n```\n\nThis loader loads pages from a given Confluence cloud instance. The user needs to specify the base URL for a Confluence\ninstance to initialize the ConfluenceReader - base URL needs to end with `/wiki`.\n\nThe user can optionally specify OAuth 2.0 credentials to authenticate with the Confluence instance. If no credentials are\nspecified, the loader will look for `CONFLUENCE_API_TOKEN` or `CONFLUENCE_USERNAME`/`CONFLUENCE_PASSWORD` environment variables\nto proceed with basic authentication.\n\n> [!NOTE]\n> Keep in mind `CONFLUENCE_PASSWORD` is not your actual password, but an API Token obtained here: https://id.atlassian.com/manage-profile/security/api-tokens.\n\nThe following order is used for checking authentication credentials:\n\n1. `oauth2`\n2. `api_token`\n3. `user_name` and `password`\n4. Environment variable `CONFLUENCE_API_TOKEN`\n5. Environment variable `CONFLUENCE_USERNAME` and `CONFLUENCE_PASSWORD`\n\nFor more on authenticating using OAuth 2.0, checkout:\n\n- https://atlassian-python-api.readthedocs.io/index.html\n- https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/\n\nConfluence pages are obtained through one of 4 four mutually exclusive ways:\n\n1. `page_ids`: Load all pages from a list of page ids\n2. `space_key`: Load all pages from a space\n3. `label`: Load all pages with a given label\n4. `cql`: Load all pages that match a given CQL query (Confluence Query Language https://developer.atlassian.com/cloud/confluence/advanced-searching-using-cql/ ).\n\nWhen `page_ids` is specified, `include_children` will cause the loader to also load all descendent pages.\nWhen `space_key` is specified, `page_status` further specifies the status of pages to load: None, 'current', 'archived', 'draft'.\n\nlimit (int): Deprecated, use `max_num_results` instead.\n\nmax_num_results (int): Maximum number of results to return. If None, return all results. Requests are made in batches to achieve the desired number of results.\n\nstart(int): Which offset we should jump to when getting pages, only works with space_key\n\ncursor(str): An alternative to start for cql queries, the cursor is a pointer to the next \"page\" when searching atlassian products. The current one after a search can be found with `get_next_cursor()`\n\nUser can also specify a boolean `include_attachments` to\ninclude attachments, this is set to `False` by default, if set to `True` all attachments will be downloaded and\nConfluenceReader will extract the text from the attachments and add it to the Document object.\nCurrently supported attachment types are: PDF, PNG, JPEG/JPG, SVG, Word and Excel.\n\nHint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>\n\n## Usage\n\nHere's an example usage of the ConfluenceReader.\n\n```python\n# Example that reads the pages with the `page_ids`\nfrom llama_index.readers.confluence import ConfluenceReader\n\ntoken = {\"access_token\": \"<access_token>\", \"token_type\": \"<token_type>\"}\noauth2_dict = {\"client_id\": \"<client_id>\", \"token\": token}\n\nbase_url = \"https://yoursite.atlassian.com/wiki\"\n\npage_ids = [\"<page_id_1>\", \"<page_id_2>\", \"<page_id_3\"]\nspace_key = \"<space_key>\"\n\nreader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)\ndocuments = reader.load_data(\n    space_key=space_key, include_attachments=True, page_status=\"current\"\n)\ndocuments.extend(\n    reader.load_data(\n        page_ids=page_ids, include_children=True, include_attachments=True\n    )\n)\n```\n\n```python\n# Example that fetches the first 5, then the next 5 pages from a space\nfrom llama_index.readers.confluence import ConfluenceReader\n\ntoken = {\"access_token\": \"<access_token>\", \"token_type\": \"<token_type>\"}\noauth2_dict = {\"client_id\": \"<client_id>\", \"token\": token}\n\nbase_url = \"https://yoursite.atlassian.com/wiki\"\n\nspace_key = \"<space_key>\"\n\nreader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)\ndocuments = reader.load_data(\n    space_key=space_key,\n    include_attachments=True,\n    page_status=\"current\",\n    start=0,\n    max_num_results=5,\n)\ndocuments.extend(\n    reader.load_data(\n        space_key=space_key,\n        include_children=True,\n        include_attachments=True,\n        start=5,\n        max_num_results=5,\n    )\n)\n```\n\n```python\n# Example that fetches the first 5 results from a cql query, the uses the cursor to pick up on the next element\nfrom llama_index.readers.confluence import ConfluenceReader\n\ntoken = {\"access_token\": \"<access_token>\", \"token_type\": \"<token_type>\"}\noauth2_dict = {\"client_id\": \"<client_id>\", \"token\": token}\n\nbase_url = \"https://yoursite.atlassian.com/wiki\"\n\ncql = f'type=\"page\" AND label=\"devops\"'\n\nreader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)\ndocuments = reader.load_data(cql=cql, max_num_results=5)\ncursor = reader.get_next_cursor()\ndocuments.extend(reader.load_data(cql=cql, cursor=cursor, max_num_results=5))\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers confluence integration",
    "version": "0.2.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "fc9a8a56978351646fab8161988f30339218be14c31b89440d0970354caaf7c1",
                "md5": "1cc193402085eba2d195ec86b8856167",
                "sha256": "3d66f14147a4f44f739b7cc19179dde86bc556d581164245de8df5d3bedd9f43"
            },
            "downloads": -1,
            "filename": "llama_index_readers_confluence-0.2.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "1cc193402085eba2d195ec86b8856167",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.8.1",
            "size": 9714,
            "upload_time": "2024-08-28T15:01:23",
            "upload_time_iso_8601": "2024-08-28T15:01:23.496367Z",
            "url": "https://files.pythonhosted.org/packages/fc/9a/8a56978351646fab8161988f30339218be14c31b89440d0970354caaf7c1/llama_index_readers_confluence-0.2.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92e5d5e18f7792ca9d48797c9cf2032d8db0311129a6a2e86c8266bd84c351c4",
                "md5": "9cb0130759eabe02a0cab2311d21e2fb",
                "sha256": "90ef5ed5119d4df3cd38a9bec83e7c3e70342d0229d4c543ec142a1b17c68cef"
            },
            "downloads": -1,
            "filename": "llama_index_readers_confluence-0.2.1.tar.gz",
            "has_sig": false,
            "md5_digest": "9cb0130759eabe02a0cab2311d21e2fb",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.8.1",
            "size": 10518,
            "upload_time": "2024-08-28T15:01:24",
            "upload_time_iso_8601": "2024-08-28T15:01:24.789699Z",
            "url": "https://files.pythonhosted.org/packages/92/e5/d5e18f7792ca9d48797c9cf2032d8db0311129a6a2e86c8266bd84c351c4/llama_index_readers_confluence-0.2.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-28 15:01:24",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-confluence"
}
        
Elapsed time: 0.33367s