# NucliaDB SDK
The NucliaDB SDK is a Python library designed as a thin wrapper around the [NucliaDB HTTP API](https://docs.nuclia.dev/docs/api). It is tailored for developers who wish to create low-level scripts to interact with NucliaDB.
## WARNING
⚠ If it's your first time using Nuclia or you want a simple way to push your unstructured data to Nuclia with a script or a CLI, we highly recommend using the [Nuclia CLI/SDK](https://github.com/nuclia/nuclia.py) instead, as it is much more user-friendly and use-case focused. ⚠
## Installation
To install it, simply with pip:
```bash
pip install nucliadb-sdk
```
## How to use it?
To connect to a Nuclia-hosted NucliaDB instance, just use the `NucliaDB` constructor method with the `api_key`:
```python
from nucliadb_sdk import NucliaDB, Region
ndb = NucliaDB(region=Region.EUROPE1, api_key="my-api-key")
```
Alternatively, to connect to a NucliaDB local installation, use:
```python
ndb = NucliaDB(region=Region.ON_PREM, api="http://localhost:8080/api")
```
Then, each method of the `NucliaDB` class maps to an HTTP endpoint of the NucliaDB API. The parameters it accepts correspond to the Pydantic models associated to the request body scheme of the endpoint.
The method-to-endpoint mappings for the sdk are declared in-code [in the _NucliaDBBase class](https://github.com/nuclia/nucliadb/blob/main/nucliadb_sdk/src/nucliadb_sdk/v2/sdk.py).
For instance, to create a resource in your Knowledge Box, the endpoint is defined [here](https://docs.nuclia.dev/docs/api#tag/Resources/operation/Create_Resource_kb__kbid__resources_post).
It has a `{kbid}` path parameter and is expecting a json payload with some optional keys like `slug` or `title`, that are of type string. With `curl`, the command would be:
```bash
curl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{"slug":"my-resource","title":"My Resource"}' -H "Content-Type: application/json"
{"uuid":"fbdb10a79abc45c0b13400f5697ea2ba","seqid":1}
```
and with the NucliaDB sdk:
```python
>>> from nucliadb_sdk import NucliaDB
>>>
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> ndb.create_resource(kbid="my-kbid", slug="my-resource", title="My Resource")
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
```
Note that paths parameters are mapped as required keyword arguments of the `NucliaDB` class methods: hence the `kbid="my-kbid"`. Any other keyword arguments specified in the method will be sent along in the json request body of the HTTP request.
Alternatively, you can also define the `content` parameter and pass an instance of the Pydantic model that the endpoint expects:
```python
>>> from nucliadb_sdk import NucliaDB
>>> from nucliadb_models.writer import CreateResourcePayload
>>>
>>> ndb = NucliaDB(region="on-prem", url="http://localhost:8080/api")
>>> content = CreateResourcePayload(slug="my-resource", title="My Resource")
>>> ndb.create_resource(kbid="my-kbid", content=content)
ResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)
```
Query parameters can be passed too on each method with the `query_params` argument. For instance:
```python
>>> ndb.get_resource_by_id(kbid="my-kbid", rid="rid", query_params={"show": ["values"]})
```
### Example Usage
The following is a sample script that fetches the HTML of a website, extracts all links that it finds on it and pushes them to NucliaDB so that they get processed by Nuclia's processing engine.
```python
from nucliadb_models.link import LinkField
from nucliadb_models.writer import CreateResourcePayload
import nucliadb_sdk
import requests
from bs4 import BeautifulSoup
def extract_links_from_url(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
unique_links = set()
for link in soup.find_all("a"):
unique_links.add(link.get("href"))
return unique_links
def upload_link_to_nuclia(ndb, *, kbid, link, tags):
try:
title = link.replace("-", " ")
slug = "-".join(tags) + "-" + link.split("/")[-1]
content = CreateResourcePayload(
title=title,
slug=slug,
links={
"link": LinkField(
uri=link,
language="en",
)
},
)
ndb.create_resource(kbid=kbid, content=content)
print(f"Resource created from {link}. Title={title} Slug={slug}")
except nucliadb_sdk.exceptions.ConflictError:
print(f"Resource already exists: {link} {slug}")
except Exception as ex:
print(f"Failed to create resource: {link} {slug}: {ex}")
def main(site):
# Define the NucliaDB instance with region and URL
ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")
# Loop through extracted links and upload to NucliaDB
for link in extract_links_from_url(site):
upload_link_to_nuclia(ndb, kbid="my-kb-id", link=link, tags=["news"])
if __name__ == "__main__":
main(site="https://en.wikipedia.org/wiki/The_Lion_King")
```
After the data is pushed, the NucliaDB SDK could also be used to find answers on top of the extracted links.
```python
>>> import nucliadb_sdk
>>>
>>> ndb = nucliadb_sdk.NucliaDB(region="on-prem", url="http://localhost:8080")
>>> resp = ndb.ask(kbid="my-kb-id", query="What does Hakuna Matata mean?")
>>> print(resp.answer)
'Hakuna matata is actually a phrase in the East African language of Swahili that literally means “no trouble” or “no problems”.'
```
Raw data
{
"_id": null,
"home_page": "https://github.com/nuclia/nucliadb",
"name": "nucliadb-sdk",
"maintainer": null,
"docs_url": null,
"requires_python": "<4,>=3.9",
"maintainer_email": null,
"keywords": null,
"author": null,
"author_email": null,
"download_url": null,
"platform": null,
"description": "# NucliaDB SDK\n\nThe NucliaDB SDK is a Python library designed as a thin wrapper around the [NucliaDB HTTP API](https://docs.nuclia.dev/docs/api). It is tailored for developers who wish to create low-level scripts to interact with NucliaDB.\n\n## WARNING\n\n\u26a0 If it's your first time using Nuclia or you want a simple way to push your unstructured data to Nuclia with a script or a CLI, we highly recommend using the [Nuclia CLI/SDK](https://github.com/nuclia/nuclia.py) instead, as it is much more user-friendly and use-case focused. \u26a0\n\n## Installation\n\nTo install it, simply with pip:\n\n```bash\npip install nucliadb-sdk\n```\n\n## How to use it?\n\nTo connect to a Nuclia-hosted NucliaDB instance, just use the `NucliaDB` constructor method with the `api_key`:\n\n```python\nfrom nucliadb_sdk import NucliaDB, Region\n\nndb = NucliaDB(region=Region.EUROPE1, api_key=\"my-api-key\")\n```\n\nAlternatively, to connect to a NucliaDB local installation, use:\n\n```python\nndb = NucliaDB(region=Region.ON_PREM, api=\"http://localhost:8080/api\")\n```\n\nThen, each method of the `NucliaDB` class maps to an HTTP endpoint of the NucliaDB API. The parameters it accepts correspond to the Pydantic models associated to the request body scheme of the endpoint.\n\nThe method-to-endpoint mappings for the sdk are declared in-code\u00a0[in the _NucliaDBBase class](https://github.com/nuclia/nucliadb/blob/main/nucliadb_sdk/src/nucliadb_sdk/v2/sdk.py).\n\nFor instance, to create a resource in your Knowledge Box, the endpoint is defined [here](https://docs.nuclia.dev/docs/api#tag/Resources/operation/Create_Resource_kb__kbid__resources_post).\n\nIt has a `{kbid}` path parameter and is expecting a json payload with some optional keys like `slug` or `title`, that are of type string. With `curl`, the command would be:\n\n```bash\ncurl -XPOST http://localhost:8080/api/v1/kb/my-kbid/resources -H 'x-nucliadb-roles: WRITER' --data-binary '{\"slug\":\"my-resource\",\"title\":\"My Resource\"}' -H \"Content-Type: application/json\"\n{\"uuid\":\"fbdb10a79abc45c0b13400f5697ea2ba\",\"seqid\":1}\n```\n\nand with the NucliaDB sdk:\n\n```python\n>>> from nucliadb_sdk import NucliaDB\n>>>\n>>> ndb = NucliaDB(region=\"on-prem\", url=\"http://localhost:8080/api\")\n>>> ndb.create_resource(kbid=\"my-kbid\", slug=\"my-resource\", title=\"My Resource\")\nResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)\n```\n\nNote that paths parameters are mapped as required keyword arguments of the `NucliaDB` class methods: hence the `kbid=\"my-kbid\"`. Any other keyword arguments specified in the method will be sent along in the json request body of the HTTP request.\n\nAlternatively, you can also define the `content` parameter and pass an instance of the Pydantic model that the endpoint expects:\n\n```python\n>>> from nucliadb_sdk import NucliaDB\n>>> from nucliadb_models.writer import CreateResourcePayload\n>>> \n>>> ndb = NucliaDB(region=\"on-prem\", url=\"http://localhost:8080/api\")\n>>> content = CreateResourcePayload(slug=\"my-resource\", title=\"My Resource\")\n>>> ndb.create_resource(kbid=\"my-kbid\", content=content)\nResourceCreated(uuid='fbdb10a79abc45c0b13400f5697ea2ba', elapsed=None, seqid=1)\n```\n\nQuery parameters can be passed too on each method with the `query_params` argument. For instance:\n\n```python\n>>> ndb.get_resource_by_id(kbid=\"my-kbid\", rid=\"rid\", query_params={\"show\": [\"values\"]})\n```\n\n### Example Usage\n\nThe following is a sample script that fetches the HTML of a website, extracts all links that it finds on it and pushes them to NucliaDB so that they get processed by Nuclia's processing engine.\n\n```python\nfrom nucliadb_models.link import LinkField\nfrom nucliadb_models.writer import CreateResourcePayload\nimport nucliadb_sdk\nimport requests\nfrom bs4 import BeautifulSoup\n\n\ndef extract_links_from_url(url):\n response = requests.get(url)\n soup = BeautifulSoup(response.text, \"html.parser\")\n unique_links = set()\n for link in soup.find_all(\"a\"):\n unique_links.add(link.get(\"href\"))\n return unique_links\n\n\ndef upload_link_to_nuclia(ndb, *, kbid, link, tags):\n try:\n title = link.replace(\"-\", \" \")\n slug = \"-\".join(tags) + \"-\" + link.split(\"/\")[-1]\n content = CreateResourcePayload(\n title=title,\n slug=slug,\n links={\n \"link\": LinkField(\n uri=link,\n language=\"en\",\n )\n },\n )\n ndb.create_resource(kbid=kbid, content=content)\n print(f\"Resource created from {link}. Title={title} Slug={slug}\")\n except nucliadb_sdk.exceptions.ConflictError:\n print(f\"Resource already exists: {link} {slug}\")\n except Exception as ex:\n print(f\"Failed to create resource: {link} {slug}: {ex}\")\n\n\ndef main(site):\n # Define the NucliaDB instance with region and URL\n ndb = nucliadb_sdk.NucliaDB(region=\"on-prem\", url=\"http://localhost:8080\")\n\n # Loop through extracted links and upload to NucliaDB\n for link in extract_links_from_url(site):\n upload_link_to_nuclia(ndb, kbid=\"my-kb-id\", link=link, tags=[\"news\"])\n\nif __name__ == \"__main__\":\n main(site=\"https://en.wikipedia.org/wiki/The_Lion_King\")\n```\n\nAfter the data is pushed, the NucliaDB SDK could also be used to find answers on top of the extracted links.\n\n```python\n>>> import nucliadb_sdk\n>>> \n>>> ndb = nucliadb_sdk.NucliaDB(region=\"on-prem\", url=\"http://localhost:8080\")\n>>> resp = ndb.ask(kbid=\"my-kb-id\", query=\"What does Hakuna Matata mean?\")\n>>> print(resp.answer)\n'Hakuna matata is actually a phrase in the East African language of Swahili that literally means \u201cno trouble\u201d or \u201cno problems\u201d.'\n```\n",
"bugtrack_url": null,
"license": "BSD",
"summary": null,
"version": "6.0.1.post2461",
"project_urls": {
"Homepage": "https://github.com/nuclia/nucliadb"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "adc22adb100d1d1d95a1cab26a6768bd07389bff5c0b0cf3c9d5ecc5dbcaf2fa",
"md5": "6924b1f83342d75f70a088afae553950",
"sha256": "cf8c709ea657f460aedfc5736642769358f01efafa77c9015d52e693a6e3f3a6"
},
"downloads": -1,
"filename": "nucliadb_sdk-6.0.1.post2461-py3-none-any.whl",
"has_sig": false,
"md5_digest": "6924b1f83342d75f70a088afae553950",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": "<4,>=3.9",
"size": 19845,
"upload_time": "2024-12-05T15:47:25",
"upload_time_iso_8601": "2024-12-05T15:47:25.207456Z",
"url": "https://files.pythonhosted.org/packages/ad/c2/2adb100d1d1d95a1cab26a6768bd07389bff5c0b0cf3c9d5ecc5dbcaf2fa/nucliadb_sdk-6.0.1.post2461-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-12-05 15:47:25",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "nuclia",
"github_project": "nucliadb",
"travis_ci": false,
"coveralls": true,
"github_actions": true,
"lcname": "nucliadb-sdk"
}