nucliadb-sdk


Namenucliadb-sdk JSON
Version 3.0.3.post457 PyPI version JSON
download
home_pagehttps://docs.nuclia.dev/docs/guides/nucliadb/python_nucliadb_sdk
SummaryNone
upload_time2024-04-26 09:34:50
maintainerNone
docs_urlNone
authorNone
requires_python<4,>=3.8
licenseBSD
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # NucliaDB SDK

The NucliaDB SDK is a Python library designed as a thin wrapper around the [NucliaDB HTTP API](https://docs.nuclia.dev/docs/api). It is tailored for developers who wish to create low-level scripts to interact with NucliaDB.

## WARNING

⚠ If it's your first time using Nuclia or you want a simple way to push your unstructured data to Nuclia with a script or a CLI, we highly recommend using the [Nuclia CLI/SDK](https://github.com/nuclia/nuclia.py) instead, as it is much more user-friendly and use-case focused. ⚠

## Installation

To install it, simply with pip:

```bash
pip install nucliadb-sdk
```

## How to use it?

To connect to a Nuclia-hosted NucliaDB instance, just use the `NucliaDB` constructor method with the `api_key`:

```python
from nucliadb_sdk import NucliaDB, Region

ndb = NucliaDB(region=Region.EUROPE1, api_key="my-api-key")
```

Alternatively, to connect to a NucliaDB local installation, use:

```python
ndb = NucliaDB(region=Region.ON_PREM, api="http://localhost:8080/api")
```

Then, each method of the `NucliaDB` class maps to an HTTP endpoint of the NucliaDB API. The parameters it accepts correspond to the Pydantic models associated to the request body scheme of the endpoint.

You can use Python's built-in `help` directive to check the list of available methods with their signature:

```python
from nucliadb_sdk import NucliaDB

help(NucliaDB)
help(NucliaDB.chat)
help(NucliaDB.create_resource)
```

The method-to-endpoint mappings for the sdk are declared in-code [in the _NucliaDBBase class](https://github.com/nuclia/nucliadb/blob/main/nucliadb_sdk/nucliadb_sdk/v2/sdk.py).

For instance, to create a resource in your Knowledge Box, the endpoint is defined [here](https://docs.nuclia.dev/docs/api#tag/Resources/operation/Create_Resource_kb__kbid__resources_post).

It has a `{kbid}` path parameter and is expecting a json payload with some optional keys like `slug` or `title`, that are of type string. With `curl`, the command would be:

```bash
curl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{"slug":"my-resource","title":"My Resource"}' -H "Content-Type: application/json"
{"uuid":"fbdb10a79abc45c0b13400f5697ea2ba","seqid":1}
```

and with the NucliaDB sdk:

```python
>>> from nucliadb_sdk import NucliaDB
>>>
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> ndb.create_resource(kbid="my-kbid", slug="my-resource", title="My Resource")
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
```

Note that paths parameters are mapped as required keyword arguments of the `NucliaDB` class methods: hence the `kbid="my-kbid"`. Any other keyword arguments specified in the method will be sent along in the json request body of the HTTP request.

Alternatively, you can also define the `content` parameter and pass an instance of the Pydantic model that the endpoint expects:

```python
>>> from nucliadb_sdk import NucliaDB
>>> from nucliadb_models.writer import CreateResourcePayload
>>> 
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> content = CreateResourcePayload(slug="my-resource", title="My Resource")
>>> ndb.create_resource(kbid="my-kbid", content=content)
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
```

Query parameters can be passed too on each method with the `query_params` argument. For instance:

```python
>>> ndb.get_resource_by_id(kbid="my-kbid", rid="rid", query_params={"show": ["values"]})
```

### Example Usage

The following is a sample script that fetches the HTML of a website, extracts all links that it finds on it and pushes them to NucliaDB so that they get processed by Nuclia's processing engine.

```python
from nucliadb_models.link import LinkField
from nucliadb_models.writer import CreateResourcePayload
import nucliadb_sdk
import requests
from bs4 import BeautifulSoup


def extract_links_from_url(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    unique_links = set()
    for link in soup.find_all("a"):
        unique_links.add(link.get("href"))
    return unique_links


def upload_link_to_nuclia(ndb, *, kbid, link, tags):
    try:
        title = link.replace("-", " ")
        slug = "-".join(tags) + "-" + link.split("/")[-1]
        content = CreateResourcePayload(
            title=title,
            slug=slug,
            links={
                "link": LinkField(
                    uri=link,
                    language="en",
                )
            },
        )
        ndb.create_resource(kbid=kbid, content=content)
        print(f"Resource created from {link}. Title={title} Slug={slug}")
    except nucliadb_sdk.exceptions.ConflictError:
        print(f"Resource already exists: {link} {slug}")
    except Exception as ex:
        print(f"Failed to create resource: {link} {slug}: {ex}")


def main(site):
    # Define the NucliaDB instance with region and URL
    ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")

    # Loop through extracted links and upload to NucliaDB
    for link in extract_links_from_url(site):
        upload_link_to_nuclia(ndb, kbid="my-kb-id", link=link, tags=["news"])

if __name__ == "__main__":
    main(site="https://en.wikipedia.org/wiki/The_Lion_King")
```

After the data is pushed, the NucliaDB SDK could also be used to find answers on top of the extracted links.

```python
>>> import nucliadb_sdk
>>> 
>>> ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")
>>> resp = ndb.chat(kbid="my-kb-id", query="What does Hakuna Matata mean?")
>>> print(resp.answer)
'Hakuna matata is actually a phrase in the East African language of Swahili that literally means “no trouble” or “no problems”.'
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://docs.nuclia.dev/docs/guides/nucliadb/python_nucliadb_sdk",
    "name": "nucliadb-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.8",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "# NucliaDB SDK\n\nThe NucliaDB SDK is a Python library designed as a thin wrapper around the [NucliaDB HTTP API](https://docs.nuclia.dev/docs/api). It is tailored for developers who wish to create low-level scripts to interact with NucliaDB.\n\n## WARNING\n\n\u26a0 If it's your first time using Nuclia or you want a simple way to push your unstructured data to Nuclia with a script or a CLI, we highly recommend using the [Nuclia CLI/SDK](https://github.com/nuclia/nuclia.py) instead, as it is much more user-friendly and use-case focused. \u26a0\n\n## Installation\n\nTo install it, simply with pip:\n\n```bash\npip install nucliadb-sdk\n```\n\n## How to use it?\n\nTo connect to a Nuclia-hosted NucliaDB instance, just use the `NucliaDB` constructor method with the `api_key`:\n\n```python\nfrom nucliadb_sdk import NucliaDB, Region\n\nndb = NucliaDB(region=Region.EUROPE1, api_key=\"my-api-key\")\n```\n\nAlternatively, to connect to a NucliaDB local installation, use:\n\n```python\nndb = NucliaDB(region=Region.ON_PREM, api=\"http://localhost:8080/api\")\n```\n\nThen, each method of the `NucliaDB` class maps to an HTTP endpoint of the NucliaDB API. The parameters it accepts correspond to the Pydantic models associated to the request body scheme of the endpoint.\n\nYou can use Python's built-in `help` directive to check the list of available methods with their signature:\n\n```python\nfrom nucliadb_sdk import NucliaDB\n\nhelp(NucliaDB)\nhelp(NucliaDB.chat)\nhelp(NucliaDB.create_resource)\n```\n\nThe method-to-endpoint mappings for the sdk are declared in-code\u00a0[in the _NucliaDBBase class](https://github.com/nuclia/nucliadb/blob/main/nucliadb_sdk/nucliadb_sdk/v2/sdk.py).\n\nFor instance, to create a resource in your Knowledge Box, the endpoint is defined [here](https://docs.nuclia.dev/docs/api#tag/Resources/operation/Create_Resource_kb__kbid__resources_post).\n\nIt has a `{kbid}` path parameter and is expecting a json payload with some optional keys like `slug` or `title`, that are of type string. With `curl`, the command would be:\n\n```bash\ncurl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{\"slug\":\"my-resource\",\"title\":\"My Resource\"}' -H \"Content-Type: application/json\"\n{\"uuid\":\"fbdb10a79abc45c0b13400f5697ea2ba\",\"seqid\":1}\n```\n\nand with the NucliaDB sdk:\n\n```python\n>>> from nucliadb_sdk import NucliaDB\n>>>\n>>> ndb = NucliaDB(region=\"on-prem\", url=\"http://localhost:8080/api\")\n>>> ndb.create_resource(kbid=\"my-kbid\", slug=\"my-resource\", title=\"My Resource\")\nResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)\n```\n\nNote that paths parameters are mapped as required keyword arguments of the `NucliaDB` class methods: hence the `kbid=\"my-kbid\"`. Any other keyword arguments specified in the method will be sent along in the json request body of the HTTP request.\n\nAlternatively, you can also define the `content` parameter and pass an instance of the Pydantic model that the endpoint expects:\n\n```python\n>>> from nucliadb_sdk import NucliaDB\n>>> from nucliadb_models.writer import CreateResourcePayload\n>>> \n>>> ndb = NucliaDB(region=\"on-prem\", url=\"http://localhost:8080/api\")\n>>> content = CreateResourcePayload(slug=\"my-resource\", title=\"My Resource\")\n>>> ndb.create_resource(kbid=\"my-kbid\", content=content)\nResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)\n```\n\nQuery parameters can be passed too on each method with the `query_params` argument. For instance:\n\n```python\n>>> ndb.get_resource_by_id(kbid=\"my-kbid\", rid=\"rid\", query_params={\"show\": [\"values\"]})\n```\n\n### Example Usage\n\nThe following is a sample script that fetches the HTML of a website, extracts all links that it finds on it and pushes them to NucliaDB so that they get processed by Nuclia's processing engine.\n\n```python\nfrom nucliadb_models.link import LinkField\nfrom nucliadb_models.writer import CreateResourcePayload\nimport nucliadb_sdk\nimport requests\nfrom bs4 import BeautifulSoup\n\n\ndef extract_links_from_url(url):\n    response = requests.get(url)\n    soup = BeautifulSoup(response.text, \"html.parser\")\n    unique_links = set()\n    for link in soup.find_all(\"a\"):\n        unique_links.add(link.get(\"href\"))\n    return unique_links\n\n\ndef upload_link_to_nuclia(ndb, *, kbid, link, tags):\n    try:\n        title = link.replace(\"-\", \" \")\n        slug = \"-\".join(tags) + \"-\" + link.split(\"/\")[-1]\n        content = CreateResourcePayload(\n            title=title,\n            slug=slug,\n            links={\n                \"link\": LinkField(\n                    uri=link,\n                    language=\"en\",\n                )\n            },\n        )\n        ndb.create_resource(kbid=kbid, content=content)\n        print(f\"Resource created from {link}. Title={title} Slug={slug}\")\n    except nucliadb_sdk.exceptions.ConflictError:\n        print(f\"Resource already exists: {link} {slug}\")\n    except Exception as ex:\n        print(f\"Failed to create resource: {link} {slug}: {ex}\")\n\n\ndef main(site):\n    # Define the NucliaDB instance with region and URL\n    ndb = nucliadb_sdk.NucliaDB(region=\"on-prem\", url=\"http://localhost:8080\")\n\n    # Loop through extracted links and upload to NucliaDB\n    for link in extract_links_from_url(site):\n        upload_link_to_nuclia(ndb, kbid=\"my-kb-id\", link=link, tags=[\"news\"])\n\nif __name__ == \"__main__\":\n    main(site=\"https://en.wikipedia.org/wiki/The_Lion_King\")\n```\n\nAfter the data is pushed, the NucliaDB SDK could also be used to find answers on top of the extracted links.\n\n```python\n>>> import nucliadb_sdk\n>>> \n>>> ndb = nucliadb_sdk.NucliaDB(region=\"on-prem\", url=\"http://localhost:8080\")\n>>> resp = ndb.chat(kbid=\"my-kb-id\", query=\"What does Hakuna Matata mean?\")\n>>> print(resp.answer)\n'Hakuna matata is actually a phrase in the East African language of Swahili that literally means \u201cno trouble\u201d or \u201cno problems\u201d.'\n```\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": null,
    "version": "3.0.3.post457",
    "project_urls": {
        "Homepage": "https://docs.nuclia.dev/docs/guides/nucliadb/python_nucliadb_sdk"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "edd8ce2104f0560edc7a74fb38daf324b4d25c690883423545884fc6357eeaf7",
                "md5": "8a1f651dd4accc34d85b46c4b567ff74",
                "sha256": "c6b7ab6091dee61599d3ea6556f5910b258fa6c0526543d214467b8877f5605d"
            },
            "downloads": -1,
            "filename": "nucliadb_sdk-3.0.3.post457-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "8a1f651dd4accc34d85b46c4b567ff74",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.8",
            "size": 37384,
            "upload_time": "2024-04-26T09:34:50",
            "upload_time_iso_8601": "2024-04-26T09:34:50.251087Z",
            "url": "https://files.pythonhosted.org/packages/ed/d8/ce2104f0560edc7a74fb38daf324b4d25c690883423545884fc6357eeaf7/nucliadb_sdk-3.0.3.post457-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-26 09:34:50",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "nucliadb-sdk"
}
        
Elapsed time: 0.26430s