# Simple Statement Knowledge Bases (SSKB)
### Knowledge Base loading and annotation facilities
The *sskb* library provides easy access to Natural Language Knowledge Bases (KBs), and tools to facilitate annotation.
It exposes available KBs as sequences of simple statements. For example (from ProofWiki):
```
"A '''set''' is intuitively defined as any aggregation of
objects, called elements, which can be precisely defined in
some way or other."
```
Each statement is accompanied of relevant metadata, in the form of *premises* necessary for the statement to be true, and named entities associated with the respective KB.
*SSKB* is built upon the [Simple Annotation Framework (SAF)](https://github.com/dscarvalho/saf) library, which provides its data model and API.
This means it is compatible with [saf-datasets](https://github.com/neuro-symbolic-ai/saf_datasets) annotators.
## Installation
To install, you can use pip:
```bash
pip install sskb
```
## Usage
### Loading KBs and accessing data
```python
from sskb import ProofWikiKB
kb = ProofWikiKB()
print(len(kb)) # Number of statements in the KB
# 146723
print(kb[0].surface) # First statement in the KB
# A '''set''' is intuitively defined as any aggregation of objects, called elements, which can be precisely defined in some way or other.
print([token.surface for token in kb[0].tokens]) # Tokens (SpaCy) of the first statement.
# ['A', "''", "'", 'set', "''", "'", 'is', 'intuitively', 'defined', 'as', 'any', 'aggregation', 'of', 'objects', ',', 'called', 'elements', ',', 'which', 'can', 'be', 'precisely', 'defined', 'in', 'some', 'way', 'or', 'other', '.']
print(kb[0].annotations) # Annotations for the first sentence
# {'split': 'KB', 'type': 'fact', 'id': 337113631216859490898241823584484375642}
# There are no token annotations in this dataset
print([(tok.surface, tok.annotations) for tok in kb[0].tokens])
# [('A', {}), ("''", {}), ("'", {}), ('set', {}), ("''", {}), ("'", {}), ('is', {}), ('intuitively', {}), ('defined', {}), ('as', {}), ('any', {}), ('aggregation', {}), ('of', {}), ('objects', {}), (',', {}), ('called', {}), ('elements', {}), (',', {}), ('which', {}), ('can', {}), ('be', {}), ('precisely', {}), ('defined', {}), ('in', {}), ('some', {}), ('way', {}), ('or', {}), ('other', {}), ('.', {})]
# Entities cited in a statement
print([entity.surface for entity in kb[0].entities])
# ['Set', 'Or', 'Aggregation']
# Accessing statements by KB identifier
set_related = kb[337113631216859490898241823584484375642] # All statements connected to this identifier
print(len(set_related))
# 40
print(set_related[10].surface)
# If there are many elements in a set, then it becomes tedious and impractical to list them all in one big long explicit definition. Fortunately, however, there are other techniques for listing sets.
# Filtering ProofWiki propositions
train_propositions = [stt for stt in kb
if (stt.annotations["type"] == "proposition" and stt.annotations["split"] == "train")]
print( train_propositions[0].surface)
# Let $A$ be a preadditive category.
print("\n".join([prem.surface for prem in train_propositions[0].premises]))
# Let $\mathbf C$ be a metacategory.
# Let $A$ and $B$ be objects of $\mathbf C$.
# A '''(binary) product diagram''' for $A$ and $B$ comprises an object $P$ and morphisms $p_1: P \to A$, $p_2: P \to B$:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
# A
# &
# P
# \ar[l]_*+{p_1}
# \ar[r]^*+{p_2}
# &
# B
# }\end{xy}$
# subjected to the following universal mapping property:
# :For any object $X$ and morphisms $x_1, x_2$ like so:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
# A
# &
# X
# \ar[l]_*+{x_1}
# \ar[r]^*+{x_2}
# &
# B
# }\end{xy}$
# :there is a unique morphism $u: X \to P$ such that:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
# &
# X
# \ar[ld]_*+{x_1}
# \ar@{-->}[d]^*+{u}
# \ar[rd]^*+{x_2}
# \\
# A
# &
# P
# \ar[l]^*+{p_1}
# \ar[r]_*+{p_2}
# &
# B
# }\end{xy}$
# :is a commutative diagram, i.e., $x_1 = p_1 \circ u$ and $x_2 = p_2 \circ u$.
# In this situation, $P$ is called a '''(binary) product of $A$ and $B$''' and may be denoted $A \times B$.
# Generally, one writes $\left\langle{x_1, x_2}\right\rangle$ for the unique morphism $u$ determined by above diagram.
# The morphisms $p_1$ and $p_2$ are often taken to be implicit.
# They are called '''projections'''; if necessary, $p_1$ can be called the '''first projection''' and $p_2$ the '''second projection'''.
# {{expand|the projection definition may merit its own, separate page}}
```
**Available datasets:** e-SNLI (ESNLIKB), ProofWiki (ProofWikiKB), WorldTree (WorldTreeKB).
Raw data
{
"_id": null,
"home_page": null,
"name": "sskb",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "knowledge bases, annotated, nlp",
"author": "Danilo S. Carvalho",
"author_email": "\"Danilo S. Carvalho\" <danilo.carvalho@manchester.ac.uk>",
"download_url": "https://files.pythonhosted.org/packages/9e/ae/14d6c3653419e6b0c6eaa93f07157709e4374dc74321031473d758d91c7d/sskb-0.1.5.tar.gz",
"platform": null,
"description": "# Simple Statement Knowledge Bases (SSKB)\n### Knowledge Base loading and annotation facilities\n\nThe *sskb* library provides easy access to Natural Language Knowledge Bases (KBs), and tools to facilitate annotation. \n\nIt exposes available KBs as sequences of simple statements. For example (from ProofWiki):\n\n```\n\"A '''set''' is intuitively defined as any aggregation of \nobjects, called elements, which can be precisely defined in \nsome way or other.\"\n```\n\nEach statement is accompanied of relevant metadata, in the form of *premises* necessary for the statement to be true, and named entities associated with the respective KB. \n\n*SSKB* is built upon the [Simple Annotation Framework (SAF)](https://github.com/dscarvalho/saf) library, which provides its data model and API.\nThis means it is compatible with [saf-datasets](https://github.com/neuro-symbolic-ai/saf_datasets) annotators.\n\n\n## Installation\n\nTo install, you can use pip:\n\n```bash\npip install sskb\n```\n\n## Usage\n### Loading KBs and accessing data\n\n```python\nfrom sskb import ProofWikiKB\n\nkb = ProofWikiKB()\nprint(len(kb)) # Number of statements in the KB\n# 146723\n\nprint(kb[0].surface) # First statement in the KB\n# A '''set''' is intuitively defined as any aggregation of objects, called elements, which can be precisely defined in some way or other.\n\nprint([token.surface for token in kb[0].tokens]) # Tokens (SpaCy) of the first statement.\n# ['A', \"''\", \"'\", 'set', \"''\", \"'\", 'is', 'intuitively', 'defined', 'as', 'any', 'aggregation', 'of', 'objects', ',', 'called', 'elements', ',', 'which', 'can', 'be', 'precisely', 'defined', 'in', 'some', 'way', 'or', 'other', '.']\n\n\nprint(kb[0].annotations) # Annotations for the first sentence\n# {'split': 'KB', 'type': 'fact', 'id': 337113631216859490898241823584484375642}\n\n\n# There are no token annotations in this dataset\nprint([(tok.surface, tok.annotations) for tok in kb[0].tokens])\n# [('A', {}), (\"''\", {}), (\"'\", {}), ('set', {}), (\"''\", {}), (\"'\", {}), ('is', {}), ('intuitively', {}), ('defined', {}), ('as', {}), ('any', {}), ('aggregation', {}), ('of', {}), ('objects', {}), (',', {}), ('called', {}), ('elements', {}), (',', {}), ('which', {}), ('can', {}), ('be', {}), ('precisely', {}), ('defined', {}), ('in', {}), ('some', {}), ('way', {}), ('or', {}), ('other', {}), ('.', {})]\n\n# Entities cited in a statement\nprint([entity.surface for entity in kb[0].entities])\n# ['Set', 'Or', 'Aggregation']\n\n# Accessing statements by KB identifier\nset_related = kb[337113631216859490898241823584484375642] # All statements connected to this identifier\n\nprint(len(set_related))\n# 40\n\nprint(set_related[10].surface)\n# If there are many elements in a set, then it becomes tedious and impractical to list them all in one big long explicit definition. Fortunately, however, there are other techniques for listing sets.\n\n# Filtering ProofWiki propositions\ntrain_propositions = [stt for stt in kb \n if (stt.annotations[\"type\"] == \"proposition\" and stt.annotations[\"split\"] == \"train\")]\n\nprint( train_propositions[0].surface)\n# Let $A$ be a preadditive category.\n\nprint(\"\\n\".join([prem.surface for prem in train_propositions[0].premises]))\n# Let $\\mathbf C$ be a metacategory.\n# Let $A$ and $B$ be objects of $\\mathbf C$.\n# A '''(binary) product diagram''' for $A$ and $B$ comprises an object $P$ and morphisms $p_1: P \\to A$, $p_2: P \\to B$:\n# ::$\\begin{xy}\\xymatrix@+1em@L+3px{\n# A\n# &\n# P\n# \\ar[l]_*+{p_1}\n# \\ar[r]^*+{p_2}\n# &\n# B\n# }\\end{xy}$\n# subjected to the following universal mapping property:\n# :For any object $X$ and morphisms $x_1, x_2$ like so:\n# ::$\\begin{xy}\\xymatrix@+1em@L+3px{\n# A\n# &\n# X\n# \\ar[l]_*+{x_1}\n# \\ar[r]^*+{x_2}\n# &\n# B\n# }\\end{xy}$\n# :there is a unique morphism $u: X \\to P$ such that:\n# ::$\\begin{xy}\\xymatrix@+1em@L+3px{\n# &\n# X\n# \\ar[ld]_*+{x_1}\n# \\ar@{-->}[d]^*+{u}\n# \\ar[rd]^*+{x_2}\n# \\\\\n# A\n# &\n# P\n# \\ar[l]^*+{p_1}\n# \\ar[r]_*+{p_2}\n# &\n# B\n# }\\end{xy}$\n# :is a commutative diagram, i.e., $x_1 = p_1 \\circ u$ and $x_2 = p_2 \\circ u$.\n# In this situation, $P$ is called a '''(binary) product of $A$ and $B$''' and may be denoted $A \\times B$.\n# Generally, one writes $\\left\\langle{x_1, x_2}\\right\\rangle$ for the unique morphism $u$ determined by above diagram.\n# The morphisms $p_1$ and $p_2$ are often taken to be implicit.\n# They are called '''projections'''; if necessary, $p_1$ can be called the '''first projection''' and $p_2$ the '''second projection'''.\n# {{expand|the projection definition may merit its own, separate page}}\n```\n\n**Available datasets:** e-SNLI (ESNLIKB), ProofWiki (ProofWikiKB), WorldTree (WorldTreeKB).\n\n",
"bugtrack_url": null,
"license": null,
"summary": "Knowledge Base loading and annotation facilities",
"version": "0.1.5",
"project_urls": {
"Homepage": "https://github.com/neuro-symbolic-ai/SSKB",
"Issues": "https://github.com/neuro-symbolic-ai/SSKB/issues"
},
"split_keywords": [
"knowledge bases",
" annotated",
" nlp"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "087e0f3b82f36c2954ee9b5dd1598f8ce335a39cc7bd28c641f7dd9486c0c45a",
"md5": "b436fb1ed9175530cecb04cd69307118",
"sha256": "25152f674172c658a90ad957158e8d2838a53d2f0eba4ec6525d947264997468"
},
"downloads": -1,
"filename": "sskb-0.1.5-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b436fb1ed9175530cecb04cd69307118",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 11452,
"upload_time": "2024-09-13T13:41:43",
"upload_time_iso_8601": "2024-09-13T13:41:43.526776Z",
"url": "https://files.pythonhosted.org/packages/08/7e/0f3b82f36c2954ee9b5dd1598f8ce335a39cc7bd28c641f7dd9486c0c45a/sskb-0.1.5-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9eae14d6c3653419e6b0c6eaa93f07157709e4374dc74321031473d758d91c7d",
"md5": "35ca2b021405e5f8f48e74aa27479073",
"sha256": "9968197cfdf0bd7ea6e9c56bf074cbe64d9f8ed3e8272144856c97a099ce7965"
},
"downloads": -1,
"filename": "sskb-0.1.5.tar.gz",
"has_sig": false,
"md5_digest": "35ca2b021405e5f8f48e74aa27479073",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 10583,
"upload_time": "2024-09-13T13:41:44",
"upload_time_iso_8601": "2024-09-13T13:41:44.935567Z",
"url": "https://files.pythonhosted.org/packages/9e/ae/14d6c3653419e6b0c6eaa93f07157709e4374dc74321031473d758d91c7d/sskb-0.1.5.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-09-13 13:41:44",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "neuro-symbolic-ai",
"github_project": "SSKB",
"github_not_found": true,
"lcname": "sskb"
}