# poetree
Poetree library provides an easy way to get data from PoeTree API.
[PoeTree](https://versologie.cz/poetree) is a standardized collection of poetry corpora comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.
## Install
```console
pip install poetree
```
or
```console
pip3 install poetree
```
## Usage
```python
import poetree
```
There are five classes available:
- ```Poetree``` represents the entire PoeTree collection [\[documentation\]](https://versologie.cz/poetree/python-doc/poetree)
- ```Corpus``` represents single corpus [\[documentation\]](https://versologie.cz/poetree/python-doc/corpus)
- ```Author``` represents single author [\[documentation\]](https://versologie.cz/poetree/python-doc/author)
- ```Source``` represents single book [\[documentation\]](https://versologie.cz/poetree/python-doc/source)
- ```Poem``` represents single poem [\[documentation\]](https://versologie.cz/poetree/python-doc/poem)
Each class has properties corresponding to the keys returned by API. To get for instance the number of authors and the number of poems in PoeTree.cs:
```python
corpus = poetree.Corpus('cs')
print('number of authors:', corpus.n_authors)
print('number of poems:', corpus.n_poems)
```
```console
number of authors: 606
number of poems: 80229
```
### get_corpora(), get_authors(), get_sources(), get_poems()
Class instance may be created directly (as in the example above) or one may use a more general class to create a number of children instances by means of ```get_[something]()``` methods. To get for instance all the authors in PoeTree.en born between 1750 and 1760:
```python
corpus = poetree.Corpus('en')
for author in corpus.get_authors(born_after=1750, born_before=1760):
print(f'{author.name} ({author.born})')
```
```console
Barlow, Joel (1754)
Blake, William (1757)
Crabbe, George (1754)
Freneau, Philip Morin (1752)
Roberts, David (1757)
Wheatley, Phillis (1753)
```
Analogous methods in other classes are as follow:
<img src="https://versologie.cz/poetree/img/classes.png" style="width:300px;"/>
Objects created by ```get_[something]()``` are also stored in their parent object's property ```data_[something]``` for later use:
```python
corpus = poetree.Corpus('en')
corpus.get_authors(born_after=1750, born_before=1760)
print([x.name for x in corpus.content_['authors']])
```
```console
['Barlow, Joel', 'Blake, William', 'Crabbe, George', 'Freneau, Philip Morin', 'Roberts, David', 'Wheatley, Phillis']
```
To iterate over all corpora, over all authors, over all their poems:
```python
pt = poetree.Poetree()
for corpus in pt.get_corpora():
for author in corpus.get_authors():
for poem in author.get_poems():
# Do stuff...
```
### metadata()
Each class provides```metadata()``` method. By default it gives access to all metadata properties of the class:
```python
corpus = poetree.Corpus('de')
metadata = corpus.metadata(output='pandas')
print(metadata)
```
```console
corpus desc n_authors n_poems n_lines n_types n_tokens
0 de Compiled by Klemens Bobenhausen and Benjamin H... 245 53133 1701234 678426 12482485
```
It can also be used to get metadata of all children instances it created using the ```target``` attribute:
```python
corpus = poetree.Corpus('pt')
corpus.get_authors()
df = corpus.metadata(target='authors', output='pandas')
print(df)
```
```console
id_ name viaf wiki country born died n_poems corpus
0 20 Alberto de Oliveira 29092555 Q1789301 br 1857 1937 1 pt
1 6 Antônio Gonçalves Dias 73988798 Q611997 br 1823 1864 15 pt
2 1 Augusto de Carvalho Rodrigues dos Anjos 44342515 Q769887 br 1884 1914 306 pt
3 11 Basílio da Gama 84492428 Q1789371 pt 1740 1795 6 pt
4 4 Cláudio Manuel da Costa 41967264 Q1789904 br 1729 1789 190 pt
5 25 Delminda Silveira de Sousa 99136109 Q10264874 br 1854 1932 548 pt
6 8 Emílio Nunes Correia de Meneses 71253555 Q10272640 br 1866 1918 90 pt
7 17 Francisco de Sá de Meneses 32338601 Q4403329 pt 1600 1664 12 pt
8 15 Gonçalves de Magalhães 88876973 Q2532628 br 1811 1882 56 pt
9 16 Gregório de Matos Guerra 122323020 Q983565 br 1636 1696 704 pt
10 13 Gustavo de Paula Teixeira 60882245 Q10293150 br 1881 1937 122 pt
11 23 José Pedro Xavier Pinheiro 121971768 Q15631816 br 1822 1882 100 pt
12 14 José Joaquim Correia de Almeida 50626334 Q25859802 br 1820 1905 953 pt
13 7 José de Santa Rita Durão 730145601965601320827 Q6958165 br 1722 1784 10 pt
14 5 João da Cruz e Sousa 4972695 Q2609059 br 1861 1898 407 pt
15 10 Juvêncio de Araújo Figueredo 4738150325583310090002 Q10313264 br 1865 1927 438 pt
16 24 Laurindo José da Silva Rabelo 65149066775765602956 Q6501838 br 1826 1864 184 pt
17 9 Luís Nicolau Fagundes Varela 24730690 Q2088965 br 1841 1875 215 pt
18 3 Luís Vaz de Camões 34454091 Q590 pt 1524 1580 10 pt
19 2 Manuel Maria Barbosa l'Hedois du Bocage 66536132 Q630116 pt 1765 1805 146 pt
20 18 Múcio Scevola Lopes Teixeira 11159986 Q10334840 br 1857 1926 86 pt
21 19 Nicolau Tolentino de Almeida 54191676 Q740967 pt 1740 1811 244 pt
22 21 Olavo Brás Martins dos Guimarães Bilac 73898176 Q982354 br 1865 1918 75 pt
23 22 Sebastião Cícero dos Guimarães Passos 25942646 Q10292895 br 1867 1909 75 pt
24 12 Tomás Antônio Gonzaga 19760014 Q1334602 br 1744 1810 110 pt
```
### get_body(), get_all()
When ```Poem``` instance is created, only its metadata are fetched from API. To get the body (lines, words and their annotation) one needs to call ```get_body()``` method first.
```python
poem = poetree.Poem(id_=1, lang='cs')
body = poem.get_body()
print(body[0])
```
```console
{'id_': 1, 'id': 0, 'id_stanza': 1, 'text': 'Tvá loď jde po vysokém moři,', 'part': False, 'words': [{'id_': 1, 'id': 1, 'id_sentence': 1, 'head': 2, 'deprel': 'det', 'form': 'Tvá', 'lemma': 'tvůj', 'upos': 'DET', 'xpos': 'PSFS1-S1------1', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|PronType=Dem'}, {'id_': 2, 'id': 2, 'id_sentence': 1, 'head': 3, 'deprel': 'nsubj', 'form': 'loď', 'lemma': 'loď', 'upos': 'NOUN', 'xpos': 'NNFS1-----A----', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|Polarity=Pos'}, {'id_': 3, 'id': 3, 'id_sentence': 1, 'head': 0, 'deprel': 'root', 'form': 'jde', 'lemma': 'jít', 'upos': 'VERB', 'xpos': 'VB-S---3P-AA---', 'feats': 'Mood=Ind|Number=Sing|Person=3|Polarity=Pos|Tense=Pres|VerbForm=Fin|Voice=Act'}, {'id_': 4, 'id': 4, 'id_sentence': 1, 'head': 6, 'deprel': 'case', 'form': 'po', 'lemma': 'po', 'upos': 'ADP', 'xpos': 'RR--6----------', 'feats': 'AdpType=Prep|Case=Loc'}, {'id_': 5, 'id': 5, 'id_sentence': 1, 'head': 6, 'deprel': 'amod', 'form': 'vysokém', 'lemma': 'vysoký', 'upos': 'ADJ', 'xpos': 'AANS6----1A----', 'feats': 'Case=Loc|Degree=Pos|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 6, 'id': 6, 'id_sentence': 1, 'head': 3, 'deprel': 'obl', 'form': 'moři', 'lemma': 'moře', 'upos': 'NOUN', 'xpos': 'NNNS6-----A----', 'feats': 'Case=Loc|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 7, 'id': 7, 'id_sentence': 1, 'head': 9, 'deprel': 'punct', 'form': ',', 'lemma': ',', 'upos': 'PUNCT', 'xpos': 'Z:-------------', 'feats': '_'}]}
```
To retrieve both the metadata and the body of the poem at the same time, there is a method ```get_all()```
```python
poem = poetree.Poem(id_=1, lang='cs')
metadata_and_body = poem.get_all()
```
Raw data
{
"_id": null,
"home_page": "https://github.com/versotym/poetree",
"name": "poetree",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "poetry, corpus, versification",
"author": "Petr Plechac",
"author_email": "plechac@ucl.cas.cz",
"download_url": "https://files.pythonhosted.org/packages/86/0d/ead18fc8cc5bbe3fca30b4666b4d5f0b83e1373c39272877b2dcb8a1733d/poetree-0.0.2.tar.gz",
"platform": null,
"description": "# poetree\nPoetree library provides an easy way to get data from PoeTree API.\n\n[PoeTree](https://versologie.cz/poetree) is a standardized collection of poetry corpora comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.\n\n## Install\n```console\npip install poetree\n```\nor\n```console\npip3 install poetree\n```\n\n## Usage\n```python\nimport poetree\n```\n\nThere are five classes available:\n- ```Poetree``` represents the entire PoeTree collection [\\[documentation\\]](https://versologie.cz/poetree/python-doc/poetree)\n- ```Corpus``` represents single corpus [\\[documentation\\]](https://versologie.cz/poetree/python-doc/corpus)\n- ```Author``` represents single author [\\[documentation\\]](https://versologie.cz/poetree/python-doc/author)\n- ```Source``` represents single book [\\[documentation\\]](https://versologie.cz/poetree/python-doc/source)\n- ```Poem``` represents single poem [\\[documentation\\]](https://versologie.cz/poetree/python-doc/poem)\n\nEach class has properties corresponding to the keys returned by API. To get for instance the number of authors and the number of poems in PoeTree.cs:\n\n```python\ncorpus = poetree.Corpus('cs')\nprint('number of authors:', corpus.n_authors)\nprint('number of poems:', corpus.n_poems)\n```\n```console\nnumber of authors: 606\nnumber of poems: 80229\n```\n\n### get_corpora(), get_authors(), get_sources(), get_poems()\nClass instance may be created directly (as in the example above) or one may use a more general class to create a number of children instances by means of ```get_[something]()``` methods. To get for instance all the authors in PoeTree.en born between 1750 and 1760:\n\n```python\ncorpus = poetree.Corpus('en')\nfor author in corpus.get_authors(born_after=1750, born_before=1760):\n print(f'{author.name} ({author.born})')\n```\n```console\nBarlow, Joel (1754)\nBlake, William (1757)\nCrabbe, George (1754)\nFreneau, Philip Morin (1752)\nRoberts, David (1757)\nWheatley, Phillis (1753)\n```\n\nAnalogous methods in other classes are as follow:\n\n<img src=\"https://versologie.cz/poetree/img/classes.png\" style=\"width:300px;\"/>\n\nObjects created by ```get_[something]()``` are also stored in their parent object's property ```data_[something]``` for later use:\n\n```python\ncorpus = poetree.Corpus('en')\ncorpus.get_authors(born_after=1750, born_before=1760)\nprint([x.name for x in corpus.content_['authors']])\n```\n```console\n['Barlow, Joel', 'Blake, William', 'Crabbe, George', 'Freneau, Philip Morin', 'Roberts, David', 'Wheatley, Phillis']\n```\n\nTo iterate over all corpora, over all authors, over all their poems:\n```python\npt = poetree.Poetree()\nfor corpus in pt.get_corpora():\n for author in corpus.get_authors():\n for poem in author.get_poems():\n # Do stuff...\n```\n\n### metadata()\nEach class provides```metadata()``` method. By default it gives access to all metadata properties of the class:\n\n```python\ncorpus = poetree.Corpus('de')\nmetadata = corpus.metadata(output='pandas')\nprint(metadata)\n```\n```console\n corpus desc n_authors n_poems n_lines n_types n_tokens\n0 de Compiled by Klemens Bobenhausen and Benjamin H... 245 53133 1701234 678426 12482485\n```\n\nIt can also be used to get metadata of all children instances it created using the ```target``` attribute:\n```python\ncorpus = poetree.Corpus('pt')\ncorpus.get_authors()\ndf = corpus.metadata(target='authors', output='pandas') \nprint(df)\n```\n```console\n id_ name viaf wiki country born died n_poems corpus\n0 20 Alberto de Oliveira 29092555 Q1789301 br 1857 1937 1 pt\n1 6 Ant\u00f4nio Gon\u00e7alves Dias 73988798 Q611997 br 1823 1864 15 pt\n2 1 Augusto de Carvalho Rodrigues dos Anjos 44342515 Q769887 br 1884 1914 306 pt\n3 11 Bas\u00edlio da Gama 84492428 Q1789371 pt 1740 1795 6 pt\n4 4 Cl\u00e1udio Manuel da Costa 41967264 Q1789904 br 1729 1789 190 pt\n5 25 Delminda Silveira de Sousa 99136109 Q10264874 br 1854 1932 548 pt\n6 8 Em\u00edlio Nunes Correia de Meneses 71253555 Q10272640 br 1866 1918 90 pt\n7 17 Francisco de S\u00e1 de Meneses 32338601 Q4403329 pt 1600 1664 12 pt\n8 15 Gon\u00e7alves de Magalh\u00e3es 88876973 Q2532628 br 1811 1882 56 pt\n9 16 Greg\u00f3rio de Matos Guerra 122323020 Q983565 br 1636 1696 704 pt\n10 13 Gustavo de Paula Teixeira 60882245 Q10293150 br 1881 1937 122 pt\n11 23 Jose\u0301 Pedro Xavier Pinheiro 121971768 Q15631816 br 1822 1882 100 pt\n12 14 Jos\u00e9 Joaquim Correia de Almeida 50626334 Q25859802 br 1820 1905 953 pt\n13 7 Jos\u00e9 de Santa Rita Dur\u00e3o 730145601965601320827 Q6958165 br 1722 1784 10 pt\n14 5 Jo\u00e3o da Cruz e Sousa 4972695 Q2609059 br 1861 1898 407 pt\n15 10 Juv\u00eancio de Ara\u00fajo Figueredo 4738150325583310090002 Q10313264 br 1865 1927 438 pt\n16 24 Laurindo Jos\u00e9 da Silva Rabelo 65149066775765602956 Q6501838 br 1826 1864 184 pt\n17 9 Lu\u00eds Nicolau Fagundes Varela 24730690 Q2088965 br 1841 1875 215 pt\n18 3 Lu\u00eds Vaz de Cam\u00f5es 34454091 Q590 pt 1524 1580 10 pt\n19 2 Manuel Maria Barbosa l'Hedois du Bocage 66536132 Q630116 pt 1765 1805 146 pt\n20 18 M\u00facio Scevola Lopes Teixeira 11159986 Q10334840 br 1857 1926 86 pt\n21 19 Nicolau Tolentino de Almeida 54191676 Q740967 pt 1740 1811 244 pt\n22 21 Olavo Br\u00e1s Martins dos Guimar\u00e3es Bilac 73898176 Q982354 br 1865 1918 75 pt\n23 22 Sebasti\u00e3o C\u00edcero dos Guimar\u00e3es Passos 25942646 Q10292895 br 1867 1909 75 pt\n24 12 Tom\u00e1s Ant\u00f4nio Gonzaga 19760014 Q1334602 br 1744 1810 110 pt\n```\n\n### get_body(), get_all()\n\nWhen ```Poem``` instance is created, only its metadata are fetched from API. To get the body (lines, words and their annotation) one needs to call ```get_body()``` method first.\n\n```python\npoem = poetree.Poem(id_=1, lang='cs')\nbody = poem.get_body()\nprint(body[0])\n```\n```console\n{'id_': 1, 'id': 0, 'id_stanza': 1, 'text': 'Tv\u00e1 lo\u010f jde po vysok\u00e9m mo\u0159i,', 'part': False, 'words': [{'id_': 1, 'id': 1, 'id_sentence': 1, 'head': 2, 'deprel': 'det', 'form': 'Tv\u00e1', 'lemma': 'tv\u016fj', 'upos': 'DET', 'xpos': 'PSFS1-S1------1', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|PronType=Dem'}, {'id_': 2, 'id': 2, 'id_sentence': 1, 'head': 3, 'deprel': 'nsubj', 'form': 'lo\u010f', 'lemma': 'lo\u010f', 'upos': 'NOUN', 'xpos': 'NNFS1-----A----', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|Polarity=Pos'}, {'id_': 3, 'id': 3, 'id_sentence': 1, 'head': 0, 'deprel': 'root', 'form': 'jde', 'lemma': 'j\u00edt', 'upos': 'VERB', 'xpos': 'VB-S---3P-AA---', 'feats': 'Mood=Ind|Number=Sing|Person=3|Polarity=Pos|Tense=Pres|VerbForm=Fin|Voice=Act'}, {'id_': 4, 'id': 4, 'id_sentence': 1, 'head': 6, 'deprel': 'case', 'form': 'po', 'lemma': 'po', 'upos': 'ADP', 'xpos': 'RR--6----------', 'feats': 'AdpType=Prep|Case=Loc'}, {'id_': 5, 'id': 5, 'id_sentence': 1, 'head': 6, 'deprel': 'amod', 'form': 'vysok\u00e9m', 'lemma': 'vysok\u00fd', 'upos': 'ADJ', 'xpos': 'AANS6----1A----', 'feats': 'Case=Loc|Degree=Pos|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 6, 'id': 6, 'id_sentence': 1, 'head': 3, 'deprel': 'obl', 'form': 'mo\u0159i', 'lemma': 'mo\u0159e', 'upos': 'NOUN', 'xpos': 'NNNS6-----A----', 'feats': 'Case=Loc|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 7, 'id': 7, 'id_sentence': 1, 'head': 9, 'deprel': 'punct', 'form': ',', 'lemma': ',', 'upos': 'PUNCT', 'xpos': 'Z:-------------', 'feats': '_'}]}\n```\n\nTo retrieve both the metadata and the body of the poem at the same time, there is a method ```get_all()```\n\n```python\npoem = poetree.Poem(id_=1, lang='cs')\nmetadata_and_body = poem.get_all()\n```\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "An easy way to get data from PoeTree dataset",
"version": "0.0.2",
"project_urls": {
"Download": "https://github.com/versotym/poetree/archive/refs/tags/0.0.2.tar.gz",
"Homepage": "https://github.com/versotym/poetree"
},
"split_keywords": [
"poetry",
" corpus",
" versification"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "860dead18fc8cc5bbe3fca30b4666b4d5f0b83e1373c39272877b2dcb8a1733d",
"md5": "e234410f22f410c83802639cdfb3abc7",
"sha256": "c1e75e210ca03c318305074f019bcd532e98ef6b12b9cee351d4ab2fcca1d9d0"
},
"downloads": -1,
"filename": "poetree-0.0.2.tar.gz",
"has_sig": false,
"md5_digest": "e234410f22f410c83802639cdfb3abc7",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 13572,
"upload_time": "2024-04-02T14:12:59",
"upload_time_iso_8601": "2024-04-02T14:12:59.237540Z",
"url": "https://files.pythonhosted.org/packages/86/0d/ead18fc8cc5bbe3fca30b4666b4d5f0b83e1373c39272877b2dcb8a1733d/poetree-0.0.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-04-02 14:12:59",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "versotym",
"github_project": "poetree",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "poetree"
}