# Flatten Tei
## Reformat tei-xml files to raw text + standoff annotations in json (flatdoc)
* `flatdoc` is not a standardized format
* `flatdoc` is a json file containing the whole text of a document in the `text`field
* All span annotations are in 'annotations' in form of an object.
* e.g. `{"Sentence": [{'begin':0, 'end': 13}, ...], ..}`
## Access content of `flatdoc` files
### Use Case: Get all Sentences of a document in `flatdoc`-format
* Assuming there are Sentence annotation.
```python
from flattentei import get_units
fn = <filename of flatdoc json file>
with open(fn) as f:
flatdoc = json.load(f)
sentences = get_units("Sentence", flatdoc)
```
### Use Case: Get all Entities of a document in `flatdoc`-format
* Assuming the entities are stored as `Entity` in the `annotations` field
* (In the GSAP project `ScholarlyEntitiy`)
* enrich each entity with `Sentence`-texts
* They can be found in the `container` field for each entity
```python
from flattentei import get_units
fn = <filename of flatdoc json file>
with open(fn) as f:
flatdoc = json.load(f)
entities = get_units("Entity", flatdoc, enrich_container="Sentence")
for ent in entities:
print(f'The entity span: {ent["text"]}')
sentence_text = ent['containers']['Sentence']['text']
```
Raw data
{
"_id": null,
"home_page": "https://github.com/ottowg/flatten-tei",
"name": "flattentei",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": null,
"author": "Wolf Otto",
"author_email": "wolfgang.otto@gesis.org",
"download_url": "https://files.pythonhosted.org/packages/0f/8d/f8ba573291f106cd9eae4898aba7e4087f1283ff23c5a57dcf268a81c3a4/flattentei-0.1.7.tar.gz",
"platform": null,
"description": "# Flatten Tei\n\n## Reformat tei-xml files to raw text + standoff annotations in json (flatdoc)\n\n * `flatdoc` is not a standardized format\n * `flatdoc` is a json file containing the whole text of a document in the `text`field\n * All span annotations are in 'annotations' in form of an object.\n * e.g. `{\"Sentence\": [{'begin':0, 'end': 13}, ...], ..}` \n\n## Access content of `flatdoc` files\n\n### Use Case: Get all Sentences of a document in `flatdoc`-format\n\n * Assuming there are Sentence annotation.\n\n```python\n\nfrom flattentei import get_units\n\nfn = <filename of flatdoc json file>\n\nwith open(fn) as f:\n flatdoc = json.load(f)\n sentences = get_units(\"Sentence\", flatdoc)\n```\n\n### Use Case: Get all Entities of a document in `flatdoc`-format\n * Assuming the entities are stored as `Entity` in the `annotations` field\n * (In the GSAP project `ScholarlyEntitiy`)\n * enrich each entity with `Sentence`-texts\n * They can be found in the `container` field for each entity\n\n```python\n\nfrom flattentei import get_units\n\nfn = <filename of flatdoc json file>\n\nwith open(fn) as f:\n flatdoc = json.load(f)\n entities = get_units(\"Entity\", flatdoc, enrich_container=\"Sentence\")\n\n\nfor ent in entities:\n print(f'The entity span: {ent[\"text\"]}')\n sentence_text = ent['containers']['Sentence']['text']\n```\n\n\n",
"bugtrack_url": null,
"license": "BSD 2-clause",
"summary": "Transform tei xml to a simple standoff format",
"version": "0.1.7",
"project_urls": {
"Homepage": "https://github.com/ottowg/flatten-tei"
},
"split_keywords": [],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "0f8df8ba573291f106cd9eae4898aba7e4087f1283ff23c5a57dcf268a81c3a4",
"md5": "2e24a78df8cc916e7958737bcf6c6ec0",
"sha256": "98835ee9b75173075c74dc45b75b00e786d105f59dacf7d7d8f058d926ead680"
},
"downloads": -1,
"filename": "flattentei-0.1.7.tar.gz",
"has_sig": false,
"md5_digest": "2e24a78df8cc916e7958737bcf6c6ec0",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 9515,
"upload_time": "2025-07-10T12:46:40",
"upload_time_iso_8601": "2025-07-10T12:46:40.308156Z",
"url": "https://files.pythonhosted.org/packages/0f/8d/f8ba573291f106cd9eae4898aba7e4087f1283ff23c5a57dcf268a81c3a4/flattentei-0.1.7.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-07-10 12:46:40",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ottowg",
"github_project": "flatten-tei",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "flattentei"
}