flattentei


Nameflattentei JSON
Version 0.1.7 PyPI version JSON
download
home_pagehttps://github.com/ottowg/flatten-tei
SummaryTransform tei xml to a simple standoff format
upload_time2025-07-10 12:46:40
maintainerNone
docs_urlNone
authorWolf Otto
requires_pythonNone
licenseBSD 2-clause
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Flatten Tei

## Reformat tei-xml files to raw text + standoff annotations in json (flatdoc)

 * `flatdoc` is not a standardized format
 * `flatdoc` is a json file containing the whole text of a document in the `text`field
   * All span annotations are in 'annotations' in form of an object.
   * e.g. `{"Sentence": [{'begin':0, 'end': 13}, ...], ..}` 

## Access content of `flatdoc` files

### Use Case: Get all Sentences of a document in `flatdoc`-format

  * Assuming there are Sentence annotation.

```python

from flattentei import get_units

fn = <filename of flatdoc json file>

with open(fn) as f:
    flatdoc = json.load(f)
    sentences = get_units("Sentence", flatdoc)
```

### Use Case: Get all Entities of a document in `flatdoc`-format
  * Assuming the entities are stored as `Entity` in the `annotations` field
  * (In the GSAP project `ScholarlyEntitiy`)
  * enrich each entity with `Sentence`-texts
    * They can be found in the `container` field for each entity

```python

from flattentei import get_units

fn = <filename of flatdoc json file>

with open(fn) as f:
    flatdoc = json.load(f)
    entities = get_units("Entity", flatdoc, enrich_container="Sentence")


for ent in entities:
    print(f'The entity span: {ent["text"]}')
    sentence_text = ent['containers']['Sentence']['text']
```



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ottowg/flatten-tei",
    "name": "flattentei",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": null,
    "author": "Wolf Otto",
    "author_email": "wolfgang.otto@gesis.org",
    "download_url": "https://files.pythonhosted.org/packages/0f/8d/f8ba573291f106cd9eae4898aba7e4087f1283ff23c5a57dcf268a81c3a4/flattentei-0.1.7.tar.gz",
    "platform": null,
    "description": "# Flatten Tei\n\n## Reformat tei-xml files to raw text + standoff annotations in json (flatdoc)\n\n * `flatdoc` is not a standardized format\n * `flatdoc` is a json file containing the whole text of a document in the `text`field\n   * All span annotations are in 'annotations' in form of an object.\n   * e.g. `{\"Sentence\": [{'begin':0, 'end': 13}, ...], ..}` \n\n## Access content of `flatdoc` files\n\n### Use Case: Get all Sentences of a document in `flatdoc`-format\n\n  * Assuming there are Sentence annotation.\n\n```python\n\nfrom flattentei import get_units\n\nfn = <filename of flatdoc json file>\n\nwith open(fn) as f:\n    flatdoc = json.load(f)\n    sentences = get_units(\"Sentence\", flatdoc)\n```\n\n### Use Case: Get all Entities of a document in `flatdoc`-format\n  * Assuming the entities are stored as `Entity` in the `annotations` field\n  * (In the GSAP project `ScholarlyEntitiy`)\n  * enrich each entity with `Sentence`-texts\n    * They can be found in the `container` field for each entity\n\n```python\n\nfrom flattentei import get_units\n\nfn = <filename of flatdoc json file>\n\nwith open(fn) as f:\n    flatdoc = json.load(f)\n    entities = get_units(\"Entity\", flatdoc, enrich_container=\"Sentence\")\n\n\nfor ent in entities:\n    print(f'The entity span: {ent[\"text\"]}')\n    sentence_text = ent['containers']['Sentence']['text']\n```\n\n\n",
    "bugtrack_url": null,
    "license": "BSD 2-clause",
    "summary": "Transform tei xml to a simple standoff format",
    "version": "0.1.7",
    "project_urls": {
        "Homepage": "https://github.com/ottowg/flatten-tei"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "0f8df8ba573291f106cd9eae4898aba7e4087f1283ff23c5a57dcf268a81c3a4",
                "md5": "2e24a78df8cc916e7958737bcf6c6ec0",
                "sha256": "98835ee9b75173075c74dc45b75b00e786d105f59dacf7d7d8f058d926ead680"
            },
            "downloads": -1,
            "filename": "flattentei-0.1.7.tar.gz",
            "has_sig": false,
            "md5_digest": "2e24a78df8cc916e7958737bcf6c6ec0",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 9515,
            "upload_time": "2025-07-10T12:46:40",
            "upload_time_iso_8601": "2025-07-10T12:46:40.308156Z",
            "url": "https://files.pythonhosted.org/packages/0f/8d/f8ba573291f106cd9eae4898aba7e4087f1283ff23c5a57dcf268a81c3a4/flattentei-0.1.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-10 12:46:40",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ottowg",
    "github_project": "flatten-tei",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "flattentei"
}
        
Elapsed time: 2.36408s