nucliadb-sdk


Namenucliadb-sdk JSON
Version 6.0.1.post2461 PyPI version JSON
download
home_pagehttps://github.com/nuclia/nucliadb
SummaryNone
upload_time2024-12-05 15:47:25
maintainerNone
docs_urlNone
authorNone
requires_python<4,>=3.9
licenseBSD
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            # NucliaDB SDK

The NucliaDB SDK is a Python library designed as a thin wrapper around the [NucliaDB HTTP API](https://docs.nuclia.dev/docs/api). It is tailored for developers who wish to create low-level scripts to interact with NucliaDB.

## WARNING

⚠ If it's your first time using Nuclia or you want a simple way to push your unstructured data to Nuclia with a script or a CLI, we highly recommend using the [Nuclia CLI/SDK](https://github.com/nuclia/nuclia.py) instead, as it is much more user-friendly and use-case focused. ⚠

## Installation

To install it, simply with pip:

```bash
pip install nucliadb-sdk
```

## How to use it?

To connect to a Nuclia-hosted NucliaDB instance, just use the `NucliaDB` constructor method with the `api_key`:

```python
from nucliadb_sdk import NucliaDB, Region

ndb = NucliaDB(region=Region.EUROPE1, api_key="my-api-key")
```

Alternatively, to connect to a NucliaDB local installation, use:

```python
ndb = NucliaDB(region=Region.ON_PREM, api="http://localhost:8080/api")
```

Then, each method of the `NucliaDB` class maps to an HTTP endpoint of the NucliaDB API. The parameters it accepts correspond to the Pydantic models associated to the request body scheme of the endpoint.

The method-to-endpoint mappings for the sdk are declared in-code [in the _NucliaDBBase class](https://github.com/nuclia/nucliadb/blob/main/nucliadb_sdk/src/nucliadb_sdk/v2/sdk.py).

For instance, to create a resource in your Knowledge Box, the endpoint is defined [here](https://docs.nuclia.dev/docs/api#tag/Resources/operation/Create_Resource_kb__kbid__resources_post).

It has a `{kbid}` path parameter and is expecting a json payload with some optional keys like `slug` or `title`, that are of type string. With `curl`, the command would be:

```bash
curl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{"slug":"my-resource","title":"My Resource"}' -H "Content-Type: application/json"
{"uuid":"fbdb10a79abc45c0b13400f5697ea2ba","seqid":1}
```

and with the NucliaDB sdk:

```python
>>> from nucliadb_sdk import NucliaDB
>>>
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> ndb.create_resource(kbid="my-kbid", slug="my-resource", title="My Resource")
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
```

Note that paths parameters are mapped as required keyword arguments of the `NucliaDB` class methods: hence the `kbid="my-kbid"`. Any other keyword arguments specified in the method will be sent along in the json request body of the HTTP request.

Alternatively, you can also define the `content` parameter and pass an instance of the Pydantic model that the endpoint expects:

```python
>>> from nucliadb_sdk import NucliaDB
>>> from nucliadb_models.writer import CreateResourcePayload
>>> 
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> content = CreateResourcePayload(slug="my-resource", title="My Resource")
>>> ndb.create_resource(kbid="my-kbid", content=content)
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
```

Query parameters can be passed too on each method with the `query_params` argument. For instance:

```python
>>> ndb.get_resource_by_id(kbid="my-kbid", rid="rid", query_params={"show": ["values"]})
```

### Example Usage

The following is a sample script that fetches the HTML of a website, extracts all links that it finds on it and pushes them to NucliaDB so that they get processed by Nuclia's processing engine.

```python
from nucliadb_models.link import LinkField
from nucliadb_models.writer import CreateResourcePayload
import nucliadb_sdk
import requests
from bs4 import BeautifulSoup


def extract_links_from_url(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    unique_links = set()
    for link in soup.find_all("a"):
        unique_links.add(link.get("href"))
    return unique_links


def upload_link_to_nuclia(ndb, *, kbid, link, tags):
    try:
        title = link.replace("-", " ")
        slug = "-".join(tags) + "-" + link.split("/")[-1]
        content = CreateResourcePayload(
            title=title,
            slug=slug,
            links={
                "link": LinkField(
                    uri=link,
                    language="en",
                )
            },
        )
        ndb.create_resource(kbid=kbid, content=content)
        print(f"Resource created from {link}. Title={title} Slug={slug}")
    except nucliadb_sdk.exceptions.ConflictError:
        print(f"Resource already exists: {link} {slug}")
    except Exception as ex:
        print(f"Failed to create resource: {link} {slug}: {ex}")


def main(site):
    # Define the NucliaDB instance with region and URL
    ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")

    # Loop through extracted links and upload to NucliaDB
    for link in extract_links_from_url(site):
        upload_link_to_nuclia(ndb, kbid="my-kb-id", link=link, tags=["news"])

if __name__ == "__main__":
    main(site="https://en.wikipedia.org/wiki/The_Lion_King")
```

After the data is pushed, the NucliaDB SDK could also be used to find answers on top of the extracted links.

```python
>>> import nucliadb_sdk
>>> 
>>> ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")
>>> resp = ndb.ask(kbid="my-kb-id", query="What does Hakuna Matata mean?")
>>> print(resp.answer)
'Hakuna matata is actually a phrase in the East African language of Swahili that literally means “no trouble” or “no problems”.'
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/nuclia/nucliadb",
    "name": "nucliadb-sdk",
    "maintainer": null,
    "docs_url": null,
    "requires_python": "<4,>=3.9",
    "maintainer_email": null,
    "keywords": null,
    "author": null,
    "author_email": null,
    "download_url": null,
    "platform": null,
    "description": "# NucliaDB SDK\n\nThe NucliaDB SDK is a Python library designed as a thin wrapper around the [NucliaDB HTTP API](https://docs.nuclia.dev/docs/api). It is tailored for developers who wish to create low-level scripts to interact with NucliaDB.\n\n## WARNING\n\n\u26a0 If it's your first time using Nuclia or you want a simple way to push your unstructured data to Nuclia with a script or a CLI, we highly recommend using the [Nuclia CLI/SDK](https://github.com/nuclia/nuclia.py) instead, as it is much more user-friendly and use-case focused. \u26a0\n\n## Installation\n\nTo install it, simply with pip:\n\n```bash\npip install nucliadb-sdk\n```\n\n## How to use it?\n\nTo connect to a Nuclia-hosted NucliaDB instance, just use the `NucliaDB` constructor method with the `api_key`:\n\n```python\nfrom nucliadb_sdk import NucliaDB, Region\n\nndb = NucliaDB(region=Region.EUROPE1, api_key=\"my-api-key\")\n```\n\nAlternatively, to connect to a NucliaDB local installation, use:\n\n```python\nndb = NucliaDB(region=Region.ON_PREM, api=\"http://localhost:8080/api\")\n```\n\nThen, each method of the `NucliaDB` class maps to an HTTP endpoint of the NucliaDB API. The parameters it accepts correspond to the Pydantic models associated to the request body scheme of the endpoint.\n\nThe method-to-endpoint mappings for the sdk are declared in-code\u00a0[in the _NucliaDBBase class](https://github.com/nuclia/nucliadb/blob/main/nucliadb_sdk/src/nucliadb_sdk/v2/sdk.py).\n\nFor instance, to create a resource in your Knowledge Box, the endpoint is defined [here](https://docs.nuclia.dev/docs/api#tag/Resources/operation/Create_Resource_kb__kbid__resources_post).\n\nIt has a `{kbid}` path parameter and is expecting a json payload with some optional keys like `slug` or `title`, that are of type string. With `curl`, the command would be:\n\n```bash\ncurl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{\"slug\":\"my-resource\",\"title\":\"My Resource\"}' -H \"Content-Type: application/json\"\n{\"uuid\":\"fbdb10a79abc45c0b13400f5697ea2ba\",\"seqid\":1}\n```\n\nand with the NucliaDB sdk:\n\n```python\n>>> from nucliadb_sdk import NucliaDB\n>>>\n>>> ndb = NucliaDB(region=\"on-prem\", url=\"http://localhost:8080/api\")\n>>> ndb.create_resource(kbid=\"my-kbid\", slug=\"my-resource\", title=\"My Resource\")\nResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)\n```\n\nNote that paths parameters are mapped as required keyword arguments of the `NucliaDB` class methods: hence the `kbid=\"my-kbid\"`. Any other keyword arguments specified in the method will be sent along in the json request body of the HTTP request.\n\nAlternatively, you can also define the `content` parameter and pass an instance of the Pydantic model that the endpoint expects:\n\n```python\n>>> from nucliadb_sdk import NucliaDB\n>>> from nucliadb_models.writer import CreateResourcePayload\n>>> \n>>> ndb = NucliaDB(region=\"on-prem\", url=\"http://localhost:8080/api\")\n>>> content = CreateResourcePayload(slug=\"my-resource\", title=\"My Resource\")\n>>> ndb.create_resource(kbid=\"my-kbid\", content=content)\nResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)\n```\n\nQuery parameters can be passed too on each method with the `query_params` argument. For instance:\n\n```python\n>>> ndb.get_resource_by_id(kbid=\"my-kbid\", rid=\"rid\", query_params={\"show\": [\"values\"]})\n```\n\n### Example Usage\n\nThe following is a sample script that fetches the HTML of a website, extracts all links that it finds on it and pushes them to NucliaDB so that they get processed by Nuclia's processing engine.\n\n```python\nfrom nucliadb_models.link import LinkField\nfrom nucliadb_models.writer import CreateResourcePayload\nimport nucliadb_sdk\nimport requests\nfrom bs4 import BeautifulSoup\n\n\ndef extract_links_from_url(url):\n    response = requests.get(url)\n    soup = BeautifulSoup(response.text, \"html.parser\")\n    unique_links = set()\n    for link in soup.find_all(\"a\"):\n        unique_links.add(link.get(\"href\"))\n    return unique_links\n\n\ndef upload_link_to_nuclia(ndb, *, kbid, link, tags):\n    try:\n        title = link.replace(\"-\", \" \")\n        slug = \"-\".join(tags) + \"-\" + link.split(\"/\")[-1]\n        content = CreateResourcePayload(\n            title=title,\n            slug=slug,\n            links={\n                \"link\": LinkField(\n                    uri=link,\n                    language=\"en\",\n                )\n            },\n        )\n        ndb.create_resource(kbid=kbid, content=content)\n        print(f\"Resource created from {link}. Title={title} Slug={slug}\")\n    except nucliadb_sdk.exceptions.ConflictError:\n        print(f\"Resource already exists: {link} {slug}\")\n    except Exception as ex:\n        print(f\"Failed to create resource: {link} {slug}: {ex}\")\n\n\ndef main(site):\n    # Define the NucliaDB instance with region and URL\n    ndb = nucliadb_sdk.NucliaDB(region=\"on-prem\", url=\"http://localhost:8080\")\n\n    # Loop through extracted links and upload to NucliaDB\n    for link in extract_links_from_url(site):\n        upload_link_to_nuclia(ndb, kbid=\"my-kb-id\", link=link, tags=[\"news\"])\n\nif __name__ == \"__main__\":\n    main(site=\"https://en.wikipedia.org/wiki/The_Lion_King\")\n```\n\nAfter the data is pushed, the NucliaDB SDK could also be used to find answers on top of the extracted links.\n\n```python\n>>> import nucliadb_sdk\n>>> \n>>> ndb = nucliadb_sdk.NucliaDB(region=\"on-prem\", url=\"http://localhost:8080\")\n>>> resp = ndb.ask(kbid=\"my-kb-id\", query=\"What does Hakuna Matata mean?\")\n>>> print(resp.answer)\n'Hakuna matata is actually a phrase in the East African language of Swahili that literally means \u201cno trouble\u201d or \u201cno problems\u201d.'\n```\n",
    "bugtrack_url": null,
    "license": "BSD",
    "summary": null,
    "version": "6.0.1.post2461",
    "project_urls": {
        "Homepage": "https://github.com/nuclia/nucliadb"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "adc22adb100d1d1d95a1cab26a6768bd07389bff5c0b0cf3c9d5ecc5dbcaf2fa",
                "md5": "6924b1f83342d75f70a088afae553950",
                "sha256": "cf8c709ea657f460aedfc5736642769358f01efafa77c9015d52e693a6e3f3a6"
            },
            "downloads": -1,
            "filename": "nucliadb_sdk-6.0.1.post2461-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "6924b1f83342d75f70a088afae553950",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": "<4,>=3.9",
            "size": 19845,
            "upload_time": "2024-12-05T15:47:25",
            "upload_time_iso_8601": "2024-12-05T15:47:25.207456Z",
            "url": "https://files.pythonhosted.org/packages/ad/c2/2adb100d1d1d95a1cab26a6768bd07389bff5c0b0cf3c9d5ecc5dbcaf2fa/nucliadb_sdk-6.0.1.post2461-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-12-05 15:47:25",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "nuclia",
    "github_project": "nucliadb",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "lcname": "nucliadb-sdk"
}
        
Elapsed time: 0.45416s