sskb


Namesskb JSON
Version 0.1.5 PyPI version JSON
download
home_pageNone
SummaryKnowledge Base loading and annotation facilities
upload_time2024-09-13 13:41:44
maintainerNone
docs_urlNone
authorDanilo S. Carvalho
requires_python>=3.9
licenseNone
keywords knowledge bases annotated nlp
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Simple Statement Knowledge Bases (SSKB)
### Knowledge Base loading and annotation facilities

The *sskb* library provides easy access to Natural Language Knowledge Bases (KBs), and tools to facilitate annotation. 

It exposes available KBs as sequences of simple statements. For example (from ProofWiki):

```
"A '''set''' is intuitively defined as any aggregation of 
objects, called elements, which can be precisely defined in 
some way or other."
```

Each statement is accompanied of relevant metadata, in the form of *premises* necessary for the statement to be true, and named entities associated with the respective KB.  

*SSKB* is built upon the [Simple Annotation Framework (SAF)](https://github.com/dscarvalho/saf) library, which provides its data model and API.
This means it is compatible with [saf-datasets](https://github.com/neuro-symbolic-ai/saf_datasets) annotators.


## Installation

To install, you can use pip:

```bash
pip install sskb
```

## Usage
### Loading KBs and accessing data

```python
from sskb import ProofWikiKB

kb = ProofWikiKB()
print(len(kb))  # Number of statements in the KB
# 146723

print(kb[0].surface)  # First statement in the KB
# A '''set''' is intuitively defined as any aggregation of objects, called elements, which can be precisely defined in some way or other.

print([token.surface for token in kb[0].tokens])  # Tokens (SpaCy) of the first statement.
# ['A', "''", "'", 'set', "''", "'", 'is', 'intuitively', 'defined', 'as', 'any', 'aggregation', 'of', 'objects', ',', 'called', 'elements', ',', 'which', 'can', 'be', 'precisely', 'defined', 'in', 'some', 'way', 'or', 'other', '.']


print(kb[0].annotations)  # Annotations for the first sentence
# {'split': 'KB', 'type': 'fact', 'id': 337113631216859490898241823584484375642}


# There are no token annotations in this dataset
print([(tok.surface, tok.annotations) for tok in kb[0].tokens])
# [('A', {}), ("''", {}), ("'", {}), ('set', {}), ("''", {}), ("'", {}), ('is', {}), ('intuitively', {}), ('defined', {}), ('as', {}), ('any', {}), ('aggregation', {}), ('of', {}), ('objects', {}), (',', {}), ('called', {}), ('elements', {}), (',', {}), ('which', {}), ('can', {}), ('be', {}), ('precisely', {}), ('defined', {}), ('in', {}), ('some', {}), ('way', {}), ('or', {}), ('other', {}), ('.', {})]

# Entities cited in a statement
print([entity.surface for entity in kb[0].entities])
# ['Set', 'Or', 'Aggregation']

# Accessing statements by KB identifier
set_related = kb[337113631216859490898241823584484375642] # All statements connected to this identifier

print(len(set_related))
# 40

print(set_related[10].surface)
# If there are many elements in a set, then it becomes tedious and impractical to list them all in one big long explicit definition. Fortunately, however, there are other techniques for listing sets.

# Filtering ProofWiki propositions
train_propositions = [stt for stt in kb 
                      if (stt.annotations["type"] == "proposition" and stt.annotations["split"] == "train")]

print( train_propositions[0].surface)
# Let $A$ be a preadditive category.

print("\n".join([prem.surface for prem in train_propositions[0].premises]))
# Let $\mathbf C$ be a metacategory.
# Let $A$ and $B$ be objects of $\mathbf C$.
# A '''(binary) product diagram''' for $A$ and $B$ comprises an object $P$ and morphisms $p_1: P \to A$, $p_2: P \to B$:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
#  A
# &
#  P
#   \ar[l]_*+{p_1}
#   \ar[r]^*+{p_2}
# &
#  B
# }\end{xy}$
# subjected to the following universal mapping property:
# :For any object $X$ and morphisms $x_1, x_2$ like so:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
#  A
# &
#  X
#   \ar[l]_*+{x_1}
#   \ar[r]^*+{x_2}
# &
#  B
# }\end{xy}$
# :there is a unique morphism $u: X \to P$ such that:
# ::$\begin{xy}\xymatrix@+1em@L+3px{
# &
#  X
#   \ar[ld]_*+{x_1}
#   \ar@{-->}[d]^*+{u}
#   \ar[rd]^*+{x_2}
# \\
#  A
# &
#  P
#   \ar[l]^*+{p_1}
#   \ar[r]_*+{p_2}
# &
#  B
# }\end{xy}$
# :is a commutative diagram, i.e., $x_1 = p_1 \circ u$ and $x_2 = p_2 \circ u$.
# In this situation, $P$ is called a '''(binary) product of $A$ and $B$''' and may be denoted $A \times B$.
# Generally, one writes $\left\langle{x_1, x_2}\right\rangle$ for the unique morphism $u$ determined by above diagram.
# The morphisms $p_1$ and $p_2$ are often taken to be implicit.
# They are called '''projections'''; if necessary, $p_1$ can be called the '''first projection''' and $p_2$ the '''second projection'''.
# {{expand|the projection definition may merit its own, separate page}}
```

**Available datasets:** e-SNLI (ESNLIKB), ProofWiki (ProofWikiKB), WorldTree (WorldTreeKB).


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sskb",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "knowledge bases, annotated, nlp",
    "author": "Danilo S. Carvalho",
    "author_email": "\"Danilo S. Carvalho\" <danilo.carvalho@manchester.ac.uk>",
    "download_url": "https://files.pythonhosted.org/packages/9e/ae/14d6c3653419e6b0c6eaa93f07157709e4374dc74321031473d758d91c7d/sskb-0.1.5.tar.gz",
    "platform": null,
    "description": "# Simple Statement Knowledge Bases (SSKB)\n### Knowledge Base loading and annotation facilities\n\nThe *sskb* library provides easy access to Natural Language Knowledge Bases (KBs), and tools to facilitate annotation. \n\nIt exposes available KBs as sequences of simple statements. For example (from ProofWiki):\n\n```\n\"A '''set''' is intuitively defined as any aggregation of \nobjects, called elements, which can be precisely defined in \nsome way or other.\"\n```\n\nEach statement is accompanied of relevant metadata, in the form of *premises* necessary for the statement to be true, and named entities associated with the respective KB.  \n\n*SSKB* is built upon the [Simple Annotation Framework (SAF)](https://github.com/dscarvalho/saf) library, which provides its data model and API.\nThis means it is compatible with [saf-datasets](https://github.com/neuro-symbolic-ai/saf_datasets) annotators.\n\n\n## Installation\n\nTo install, you can use pip:\n\n```bash\npip install sskb\n```\n\n## Usage\n### Loading KBs and accessing data\n\n```python\nfrom sskb import ProofWikiKB\n\nkb = ProofWikiKB()\nprint(len(kb))  # Number of statements in the KB\n# 146723\n\nprint(kb[0].surface)  # First statement in the KB\n# A '''set''' is intuitively defined as any aggregation of objects, called elements, which can be precisely defined in some way or other.\n\nprint([token.surface for token in kb[0].tokens])  # Tokens (SpaCy) of the first statement.\n# ['A', \"''\", \"'\", 'set', \"''\", \"'\", 'is', 'intuitively', 'defined', 'as', 'any', 'aggregation', 'of', 'objects', ',', 'called', 'elements', ',', 'which', 'can', 'be', 'precisely', 'defined', 'in', 'some', 'way', 'or', 'other', '.']\n\n\nprint(kb[0].annotations)  # Annotations for the first sentence\n# {'split': 'KB', 'type': 'fact', 'id': 337113631216859490898241823584484375642}\n\n\n# There are no token annotations in this dataset\nprint([(tok.surface, tok.annotations) for tok in kb[0].tokens])\n# [('A', {}), (\"''\", {}), (\"'\", {}), ('set', {}), (\"''\", {}), (\"'\", {}), ('is', {}), ('intuitively', {}), ('defined', {}), ('as', {}), ('any', {}), ('aggregation', {}), ('of', {}), ('objects', {}), (',', {}), ('called', {}), ('elements', {}), (',', {}), ('which', {}), ('can', {}), ('be', {}), ('precisely', {}), ('defined', {}), ('in', {}), ('some', {}), ('way', {}), ('or', {}), ('other', {}), ('.', {})]\n\n# Entities cited in a statement\nprint([entity.surface for entity in kb[0].entities])\n# ['Set', 'Or', 'Aggregation']\n\n# Accessing statements by KB identifier\nset_related = kb[337113631216859490898241823584484375642] # All statements connected to this identifier\n\nprint(len(set_related))\n# 40\n\nprint(set_related[10].surface)\n# If there are many elements in a set, then it becomes tedious and impractical to list them all in one big long explicit definition. Fortunately, however, there are other techniques for listing sets.\n\n# Filtering ProofWiki propositions\ntrain_propositions = [stt for stt in kb \n                      if (stt.annotations[\"type\"] == \"proposition\" and stt.annotations[\"split\"] == \"train\")]\n\nprint( train_propositions[0].surface)\n# Let $A$ be a preadditive category.\n\nprint(\"\\n\".join([prem.surface for prem in train_propositions[0].premises]))\n# Let $\\mathbf C$ be a metacategory.\n# Let $A$ and $B$ be objects of $\\mathbf C$.\n# A '''(binary) product diagram''' for $A$ and $B$ comprises an object $P$ and morphisms $p_1: P \\to A$, $p_2: P \\to B$:\n# ::$\\begin{xy}\\xymatrix@+1em@L+3px{\n#  A\n# &\n#  P\n#   \\ar[l]_*+{p_1}\n#   \\ar[r]^*+{p_2}\n# &\n#  B\n# }\\end{xy}$\n# subjected to the following universal mapping property:\n# :For any object $X$ and morphisms $x_1, x_2$ like so:\n# ::$\\begin{xy}\\xymatrix@+1em@L+3px{\n#  A\n# &\n#  X\n#   \\ar[l]_*+{x_1}\n#   \\ar[r]^*+{x_2}\n# &\n#  B\n# }\\end{xy}$\n# :there is a unique morphism $u: X \\to P$ such that:\n# ::$\\begin{xy}\\xymatrix@+1em@L+3px{\n# &\n#  X\n#   \\ar[ld]_*+{x_1}\n#   \\ar@{-->}[d]^*+{u}\n#   \\ar[rd]^*+{x_2}\n# \\\\\n#  A\n# &\n#  P\n#   \\ar[l]^*+{p_1}\n#   \\ar[r]_*+{p_2}\n# &\n#  B\n# }\\end{xy}$\n# :is a commutative diagram, i.e., $x_1 = p_1 \\circ u$ and $x_2 = p_2 \\circ u$.\n# In this situation, $P$ is called a '''(binary) product of $A$ and $B$''' and may be denoted $A \\times B$.\n# Generally, one writes $\\left\\langle{x_1, x_2}\\right\\rangle$ for the unique morphism $u$ determined by above diagram.\n# The morphisms $p_1$ and $p_2$ are often taken to be implicit.\n# They are called '''projections'''; if necessary, $p_1$ can be called the '''first projection''' and $p_2$ the '''second projection'''.\n# {{expand|the projection definition may merit its own, separate page}}\n```\n\n**Available datasets:** e-SNLI (ESNLIKB), ProofWiki (ProofWikiKB), WorldTree (WorldTreeKB).\n\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Knowledge Base loading and annotation facilities",
    "version": "0.1.5",
    "project_urls": {
        "Homepage": "https://github.com/neuro-symbolic-ai/SSKB",
        "Issues": "https://github.com/neuro-symbolic-ai/SSKB/issues"
    },
    "split_keywords": [
        "knowledge bases",
        " annotated",
        " nlp"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "087e0f3b82f36c2954ee9b5dd1598f8ce335a39cc7bd28c641f7dd9486c0c45a",
                "md5": "b436fb1ed9175530cecb04cd69307118",
                "sha256": "25152f674172c658a90ad957158e8d2838a53d2f0eba4ec6525d947264997468"
            },
            "downloads": -1,
            "filename": "sskb-0.1.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b436fb1ed9175530cecb04cd69307118",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 11452,
            "upload_time": "2024-09-13T13:41:43",
            "upload_time_iso_8601": "2024-09-13T13:41:43.526776Z",
            "url": "https://files.pythonhosted.org/packages/08/7e/0f3b82f36c2954ee9b5dd1598f8ce335a39cc7bd28c641f7dd9486c0c45a/sskb-0.1.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9eae14d6c3653419e6b0c6eaa93f07157709e4374dc74321031473d758d91c7d",
                "md5": "35ca2b021405e5f8f48e74aa27479073",
                "sha256": "9968197cfdf0bd7ea6e9c56bf074cbe64d9f8ed3e8272144856c97a099ce7965"
            },
            "downloads": -1,
            "filename": "sskb-0.1.5.tar.gz",
            "has_sig": false,
            "md5_digest": "35ca2b021405e5f8f48e74aa27479073",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 10583,
            "upload_time": "2024-09-13T13:41:44",
            "upload_time_iso_8601": "2024-09-13T13:41:44.935567Z",
            "url": "https://files.pythonhosted.org/packages/9e/ae/14d6c3653419e6b0c6eaa93f07157709e4374dc74321031473d758d91c7d/sskb-0.1.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-09-13 13:41:44",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "neuro-symbolic-ai",
    "github_project": "SSKB",
    "github_not_found": true,
    "lcname": "sskb"
}
        
Elapsed time: 0.33201s