llama-index-readers-confluence

Name	llama-index-readers-confluence JSON
Version	0.3.1 JSON
	download
home_page	None
Summary	llama-index readers confluence integration
upload_time	2024-12-16 15:35:31
maintainer	zywilliamli
docs_url	None
author	Your Name
requires_python	<4.0,>=3.9
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Confluence Loader

```bash
pip install llama-index-readers-confluence
```

This loader loads pages from a given Confluence cloud instance. The user needs to specify the base URL for a Confluence
instance to initialize the ConfluenceReader - base URL needs to end with `/wiki`.

The user can optionally specify OAuth 2.0 credentials to authenticate with the Confluence instance. If no credentials are
specified, the loader will look for `CONFLUENCE_API_TOKEN` or `CONFLUENCE_USERNAME`/`CONFLUENCE_PASSWORD` environment variables
to proceed with basic authentication.

> [!NOTE]
> Keep in mind `CONFLUENCE_PASSWORD` is not your actual password, but an API Token obtained here: https://id.atlassian.com/manage-profile/security/api-tokens.

The following order is used for checking authentication credentials:

1. `oauth2`
2. `api_token`
3. `cookies`
4. `user_name` and `password`
5. Environment variable `CONFLUENCE_API_TOKEN`
6. Environment variable `CONFLUENCE_USERNAME` and `CONFLUENCE_PASSWORD`

For more on authenticating using OAuth 2.0, checkout:

- https://atlassian-python-api.readthedocs.io/index.html
- https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/

Confluence pages are obtained through one of 4 four mutually exclusive ways:

1. `page_ids`: Load all pages from a list of page ids
2. `space_key`: Load all pages from a space
3. `label`: Load all pages with a given label
4. `cql`: Load all pages that match a given CQL query (Confluence Query Language https://developer.atlassian.com/cloud/confluence/advanced-searching-using-cql/ ).

When `page_ids` is specified, `include_children` will cause the loader to also load all descendent pages.
When `space_key` is specified, `page_status` further specifies the status of pages to load: None, 'current', 'archived', 'draft'.

limit (int): Deprecated, use `max_num_results` instead.

max_num_results (int): Maximum number of results to return. If None, return all results. Requests are made in batches to achieve the desired number of results.

start(int): Which offset we should jump to when getting pages, only works with space_key

cursor(str): An alternative to start for cql queries, the cursor is a pointer to the next "page" when searching atlassian products. The current one after a search can be found with `get_next_cursor()`

User can also specify a boolean `include_attachments` to
include attachments, this is set to `False` by default, if set to `True` all attachments will be downloaded and
ConfluenceReader will extract the text from the attachments and add it to the Document object.
Currently supported attachment types are: PDF, PNG, JPEG/JPG, SVG, Word and Excel.

Hint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>

## Usage

Here's an example usage of the ConfluenceReader.

```python
# Example that reads the pages with the `page_ids`
from llama_index.readers.confluence import ConfluenceReader

token = {"access_token": "<access_token>", "token_type": "<token_type>"}
oauth2_dict = {"client_id": "<client_id>", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

page_ids = ["<page_id_1>", "<page_id_2>", "<page_id_3"]
space_key = "<space_key>"

reader = ConfluenceReader(
    base_url=base_url,
    oauth2=oauth2_dict,
    client_args={"backoff_and_retry": True},
)
documents = reader.load_data(
    space_key=space_key, include_attachments=True, page_status="current"
)
documents.extend(
    reader.load_data(
        page_ids=page_ids, include_children=True, include_attachments=True
    )
)
```

```python
# Example that fetches the first 5, then the next 5 pages from a space
from llama_index.readers.confluence import ConfluenceReader

token = {"access_token": "<access_token>", "token_type": "<token_type>"}
oauth2_dict = {"client_id": "<client_id>", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

space_key = "<space_key>"

reader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)
documents = reader.load_data(
    space_key=space_key,
    include_attachments=True,
    page_status="current",
    start=0,
    max_num_results=5,
)
documents.extend(
    reader.load_data(
        space_key=space_key,
        include_children=True,
        include_attachments=True,
        start=5,
        max_num_results=5,
    )
)
```

```python
# Example that fetches the first 5 results from a cql query, the uses the cursor to pick up on the next element
from llama_index.readers.confluence import ConfluenceReader

token = {"access_token": "<access_token>", "token_type": "<token_type>"}
oauth2_dict = {"client_id": "<client_id>", "token": token}

base_url = "https://yoursite.atlassian.com/wiki"

cql = f'type="page" AND label="devops"'

reader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)
documents = reader.load_data(cql=cql, max_num_results=5)
cursor = reader.get_next_cursor()
documents.extend(reader.load_data(cql=cql, cursor=cursor, max_num_results=5))
```

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "llama-index-readers-confluence",
    "maintainer": "zywilliamli",
    "docs_url": null,
    "requires_python": "<4.0,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": "Your Name",
    "author_email": "you@example.com",
    "download_url": "https://files.pythonhosted.org/packages/a1/94/94bfa83faa88761c03ad2a4493f577e16547ecb280b6b18cb6b316b6ce06/llama_index_readers_confluence-0.3.1.tar.gz",
    "platform": null,
    "description": "# Confluence Loader\n\n```bash\npip install llama-index-readers-confluence\n```\n\nThis loader loads pages from a given Confluence cloud instance. The user needs to specify the base URL for a Confluence\ninstance to initialize the ConfluenceReader - base URL needs to end with `/wiki`.\n\nThe user can optionally specify OAuth 2.0 credentials to authenticate with the Confluence instance. If no credentials are\nspecified, the loader will look for `CONFLUENCE_API_TOKEN` or `CONFLUENCE_USERNAME`/`CONFLUENCE_PASSWORD` environment variables\nto proceed with basic authentication.\n\n> [!NOTE]\n> Keep in mind `CONFLUENCE_PASSWORD` is not your actual password, but an API Token obtained here: https://id.atlassian.com/manage-profile/security/api-tokens.\n\nThe following order is used for checking authentication credentials:\n\n1. `oauth2`\n2. `api_token`\n3. `cookies`\n4. `user_name` and `password`\n5. Environment variable `CONFLUENCE_API_TOKEN`\n6. Environment variable `CONFLUENCE_USERNAME` and `CONFLUENCE_PASSWORD`\n\nFor more on authenticating using OAuth 2.0, checkout:\n\n- https://atlassian-python-api.readthedocs.io/index.html\n- https://developer.atlassian.com/cloud/confluence/oauth-2-3lo-apps/\n\nConfluence pages are obtained through one of 4 four mutually exclusive ways:\n\n1. `page_ids`: Load all pages from a list of page ids\n2. `space_key`: Load all pages from a space\n3. `label`: Load all pages with a given label\n4. `cql`: Load all pages that match a given CQL query (Confluence Query Language https://developer.atlassian.com/cloud/confluence/advanced-searching-using-cql/ ).\n\nWhen `page_ids` is specified, `include_children` will cause the loader to also load all descendent pages.\nWhen `space_key` is specified, `page_status` further specifies the status of pages to load: None, 'current', 'archived', 'draft'.\n\nlimit (int): Deprecated, use `max_num_results` instead.\n\nmax_num_results (int): Maximum number of results to return. If None, return all results. Requests are made in batches to achieve the desired number of results.\n\nstart(int): Which offset we should jump to when getting pages, only works with space_key\n\ncursor(str): An alternative to start for cql queries, the cursor is a pointer to the next \"page\" when searching atlassian products. The current one after a search can be found with `get_next_cursor()`\n\nUser can also specify a boolean `include_attachments` to\ninclude attachments, this is set to `False` by default, if set to `True` all attachments will be downloaded and\nConfluenceReader will extract the text from the attachments and add it to the Document object.\nCurrently supported attachment types are: PDF, PNG, JPEG/JPG, SVG, Word and Excel.\n\nHint: `space_key` and `page_id` can both be found in the URL of a page in Confluence - https://yoursite.atlassian.com/wiki/spaces/<space_key>/pages/<page_id>\n\n## Usage\n\nHere's an example usage of the ConfluenceReader.\n\n```python\n# Example that reads the pages with the `page_ids`\nfrom llama_index.readers.confluence import ConfluenceReader\n\ntoken = {\"access_token\": \"<access_token>\", \"token_type\": \"<token_type>\"}\noauth2_dict = {\"client_id\": \"<client_id>\", \"token\": token}\n\nbase_url = \"https://yoursite.atlassian.com/wiki\"\n\npage_ids = [\"<page_id_1>\", \"<page_id_2>\", \"<page_id_3\"]\nspace_key = \"<space_key>\"\n\nreader = ConfluenceReader(\n    base_url=base_url,\n    oauth2=oauth2_dict,\n    client_args={\"backoff_and_retry\": True},\n)\ndocuments = reader.load_data(\n    space_key=space_key, include_attachments=True, page_status=\"current\"\n)\ndocuments.extend(\n    reader.load_data(\n        page_ids=page_ids, include_children=True, include_attachments=True\n    )\n)\n```\n\n```python\n# Example that fetches the first 5, then the next 5 pages from a space\nfrom llama_index.readers.confluence import ConfluenceReader\n\ntoken = {\"access_token\": \"<access_token>\", \"token_type\": \"<token_type>\"}\noauth2_dict = {\"client_id\": \"<client_id>\", \"token\": token}\n\nbase_url = \"https://yoursite.atlassian.com/wiki\"\n\nspace_key = \"<space_key>\"\n\nreader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)\ndocuments = reader.load_data(\n    space_key=space_key,\n    include_attachments=True,\n    page_status=\"current\",\n    start=0,\n    max_num_results=5,\n)\ndocuments.extend(\n    reader.load_data(\n        space_key=space_key,\n        include_children=True,\n        include_attachments=True,\n        start=5,\n        max_num_results=5,\n    )\n)\n```\n\n```python\n# Example that fetches the first 5 results from a cql query, the uses the cursor to pick up on the next element\nfrom llama_index.readers.confluence import ConfluenceReader\n\ntoken = {\"access_token\": \"<access_token>\", \"token_type\": \"<token_type>\"}\noauth2_dict = {\"client_id\": \"<client_id>\", \"token\": token}\n\nbase_url = \"https://yoursite.atlassian.com/wiki\"\n\ncql = f'type=\"page\" AND label=\"devops\"'\n\nreader = ConfluenceReader(base_url=base_url, oauth2=oauth2_dict)\ndocuments = reader.load_data(cql=cql, max_num_results=5)\ncursor = reader.get_next_cursor()\ndocuments.extend(reader.load_data(cql=cql, cursor=cursor, max_num_results=5))\n```\n\nThis loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/).\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "llama-index readers confluence integration",
    "version": "0.3.1",
    "project_urls": null,
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "86e3ad3342a93ba75d58a61947a342a234e150a9a3dcd6071d83d6749991f023",
                "md5": "22a1e6ca7adddfec4c6b575c7321a38e",
                "sha256": "37faea76e498d1ea5bc8138482ba5e67836abd8ddc13a0cc94a3c224ac33d2e4"
            },
            "downloads": -1,
            "filename": "llama_index_readers_confluence-0.3.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "22a1e6ca7adddfec4c6b575c7321a38e",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4.0,>=3.9",
            "size": 9904,
            "upload_time": "2024-12-16T15:35:28",
            "upload_time_iso_8601": "2024-12-16T15:35:28.809419Z",
            "url": "https://files.pythonhosted.org/packages/86/e3/ad3342a93ba75d58a61947a342a234e150a9a3dcd6071d83d6749991f023/llama_index_readers_confluence-0.3.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a19494bfa83faa88761c03ad2a4493f577e16547ecb280b6b18cb6b316b6ce06",
                "md5": "1ee7b317a03c9284f5d54af808a46e55",
                "sha256": "26ecb8bacb1f9b504d3f74bfbd09884486ce8b84178d84e84a630ff0b9c91b05"
            },
            "downloads": -1,
            "filename": "llama_index_readers_confluence-0.3.1.tar.gz",
            "has_sig": false,
            "md5_digest": "1ee7b317a03c9284f5d54af808a46e55",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": "<4.0,>=3.9",
            "size": 10859,
            "upload_time": "2024-12-16T15:35:31",
            "upload_time_iso_8601": "2024-12-16T15:35:31.145295Z",
            "url": "https://files.pythonhosted.org/packages/a1/94/94bfa83faa88761c03ad2a4493f577e16547ecb280b6b18cb6b316b6ce06/llama_index_readers_confluence-0.3.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-16 15:35:31",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "llama-index-readers-confluence"
}

Your Name