poetree


Namepoetree JSON
Version 0.0.2 PyPI version JSON
download
home_pagehttps://github.com/versotym/poetree
SummaryAn easy way to get data from PoeTree dataset
upload_time2024-04-02 14:12:59
maintainerNone
docs_urlNone
authorPetr Plechac
requires_pythonNone
licenseMIT
keywords poetry corpus versification
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # poetree
Poetree library provides an easy way to get data from PoeTree API.

[PoeTree](https://versologie.cz/poetree) is a standardized collection of poetry corpora comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.

## Install
```console
pip install poetree
```
or
```console
pip3 install poetree
```

## Usage
```python
import poetree
```

There are five classes available:
- ```Poetree``` represents the entire PoeTree collection [\[documentation\]](https://versologie.cz/poetree/python-doc/poetree)
- ```Corpus``` represents single corpus [\[documentation\]](https://versologie.cz/poetree/python-doc/corpus)
- ```Author``` represents single author [\[documentation\]](https://versologie.cz/poetree/python-doc/author)
- ```Source``` represents single book [\[documentation\]](https://versologie.cz/poetree/python-doc/source)
- ```Poem``` represents single poem [\[documentation\]](https://versologie.cz/poetree/python-doc/poem)

Each class has properties corresponding to the keys returned by API. To get for instance the number of authors and the number of poems in PoeTree.cs:

```python
corpus = poetree.Corpus('cs')
print('number of authors:', corpus.n_authors)
print('number of poems:', corpus.n_poems)
```
```console
number of authors: 606
number of poems: 80229
```

###  get_corpora(), get_authors(), get_sources(), get_poems()
Class instance may be created directly (as in the example above) or one may use a more general class to create a number of children instances by means of ```get_[something]()``` methods. To get for instance all the authors in PoeTree.en born between 1750 and 1760:

```python
corpus = poetree.Corpus('en')
for author in corpus.get_authors(born_after=1750, born_before=1760):
    print(f'{author.name} ({author.born})')
```
```console
Barlow, Joel (1754)
Blake, William (1757)
Crabbe, George (1754)
Freneau, Philip Morin (1752)
Roberts, David (1757)
Wheatley, Phillis (1753)
```

Analogous methods in other classes are as follow:

<img src="https://versologie.cz/poetree/img/classes.png" style="width:300px;"/>

Objects created by ```get_[something]()``` are also stored in their parent object's property ```data_[something]``` for later use:

```python
corpus = poetree.Corpus('en')
corpus.get_authors(born_after=1750, born_before=1760)
print([x.name for x in corpus.content_['authors']])
```
```console
['Barlow, Joel', 'Blake, William', 'Crabbe, George', 'Freneau, Philip Morin', 'Roberts, David', 'Wheatley, Phillis']
```

To iterate over all corpora, over all authors, over all their poems:
```python
pt = poetree.Poetree()
for corpus in pt.get_corpora():
    for author in corpus.get_authors():
        for poem in author.get_poems():
            # Do stuff...
```

### metadata()
Each class provides```metadata()``` method. By default it gives access to all metadata properties of the class:

```python
corpus = poetree.Corpus('de')
metadata = corpus.metadata(output='pandas')
print(metadata)
```
```console
  corpus                                               desc  n_authors  n_poems  n_lines  n_types  n_tokens
0     de  Compiled by Klemens Bobenhausen and Benjamin H...        245    53133  1701234   678426  12482485
```

It can also be used to get metadata of all children instances it created using the ```target``` attribute:
```python
corpus = poetree.Corpus('pt')
corpus.get_authors()
df = corpus.metadata(target='authors', output='pandas') 
print(df)
```
```console
    id_                                     name                    viaf       wiki country  born  died  n_poems corpus
0    20                      Alberto de Oliveira                29092555   Q1789301      br  1857  1937        1     pt
1     6                   Antônio Gonçalves Dias                73988798    Q611997      br  1823  1864       15     pt
2     1  Augusto de Carvalho Rodrigues dos Anjos                44342515    Q769887      br  1884  1914      306     pt
3    11                          Basílio da Gama                84492428   Q1789371      pt  1740  1795        6     pt
4     4                  Cláudio Manuel da Costa                41967264   Q1789904      br  1729  1789      190     pt
5    25               Delminda Silveira de Sousa                99136109  Q10264874      br  1854  1932      548     pt
6     8          Emílio Nunes Correia de Meneses                71253555  Q10272640      br  1866  1918       90     pt
7    17               Francisco de Sá de Meneses                32338601   Q4403329      pt  1600  1664       12     pt
8    15                   Gonçalves de Magalhães                88876973   Q2532628      br  1811  1882       56     pt
9    16                 Gregório de Matos Guerra               122323020    Q983565      br  1636  1696      704     pt
10   13                Gustavo de Paula Teixeira                60882245  Q10293150      br  1881  1937      122     pt
11   23               José Pedro Xavier Pinheiro               121971768  Q15631816      br  1822  1882      100     pt
12   14          José Joaquim Correia de Almeida                50626334  Q25859802      br  1820  1905      953     pt
13    7                 José de Santa Rita Durão   730145601965601320827   Q6958165      br  1722  1784       10     pt
14    5                     João da Cruz e Sousa                 4972695   Q2609059      br  1861  1898      407     pt
15   10             Juvêncio de Araújo Figueredo  4738150325583310090002  Q10313264      br  1865  1927      438     pt
16   24            Laurindo José da Silva Rabelo    65149066775765602956   Q6501838      br  1826  1864      184     pt
17    9             Luís Nicolau Fagundes Varela                24730690   Q2088965      br  1841  1875      215     pt
18    3                       Luís Vaz de Camões                34454091       Q590      pt  1524  1580       10     pt
19    2  Manuel Maria Barbosa l'Hedois du Bocage                66536132    Q630116      pt  1765  1805      146     pt
20   18             Múcio Scevola Lopes Teixeira                11159986  Q10334840      br  1857  1926       86     pt
21   19             Nicolau Tolentino de Almeida                54191676    Q740967      pt  1740  1811      244     pt
22   21   Olavo Brás Martins dos Guimarães Bilac                73898176    Q982354      br  1865  1918       75     pt
23   22    Sebastião Cícero dos Guimarães Passos                25942646  Q10292895      br  1867  1909       75     pt
24   12                    Tomás Antônio Gonzaga                19760014   Q1334602      br  1744  1810      110     pt
```

### get_body(), get_all()

When ```Poem``` instance is created, only its metadata are fetched from API. To get the body (lines, words and their annotation) one needs to call ```get_body()``` method first.

```python
poem = poetree.Poem(id_=1, lang='cs')
body = poem.get_body()
print(body[0])
```
```console
{'id_': 1, 'id': 0, 'id_stanza': 1, 'text': 'Tvá loď jde po vysokém moři,', 'part': False, 'words': [{'id_': 1, 'id': 1, 'id_sentence': 1, 'head': 2, 'deprel': 'det', 'form': 'Tvá', 'lemma': 'tvůj', 'upos': 'DET', 'xpos': 'PSFS1-S1------1', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|PronType=Dem'}, {'id_': 2, 'id': 2, 'id_sentence': 1, 'head': 3, 'deprel': 'nsubj', 'form': 'loď', 'lemma': 'loď', 'upos': 'NOUN', 'xpos': 'NNFS1-----A----', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|Polarity=Pos'}, {'id_': 3, 'id': 3, 'id_sentence': 1, 'head': 0, 'deprel': 'root', 'form': 'jde', 'lemma': 'jít', 'upos': 'VERB', 'xpos': 'VB-S---3P-AA---', 'feats': 'Mood=Ind|Number=Sing|Person=3|Polarity=Pos|Tense=Pres|VerbForm=Fin|Voice=Act'}, {'id_': 4, 'id': 4, 'id_sentence': 1, 'head': 6, 'deprel': 'case', 'form': 'po', 'lemma': 'po', 'upos': 'ADP', 'xpos': 'RR--6----------', 'feats': 'AdpType=Prep|Case=Loc'}, {'id_': 5, 'id': 5, 'id_sentence': 1, 'head': 6, 'deprel': 'amod', 'form': 'vysokém', 'lemma': 'vysoký', 'upos': 'ADJ', 'xpos': 'AANS6----1A----', 'feats': 'Case=Loc|Degree=Pos|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 6, 'id': 6, 'id_sentence': 1, 'head': 3, 'deprel': 'obl', 'form': 'moři', 'lemma': 'moře', 'upos': 'NOUN', 'xpos': 'NNNS6-----A----', 'feats': 'Case=Loc|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 7, 'id': 7, 'id_sentence': 1, 'head': 9, 'deprel': 'punct', 'form': ',', 'lemma': ',', 'upos': 'PUNCT', 'xpos': 'Z:-------------', 'feats': '_'}]}
```

To retrieve both the metadata and the body of the poem at the same time, there is a method ```get_all()```

```python
poem = poetree.Poem(id_=1, lang='cs')
metadata_and_body = poem.get_all()
```

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/versotym/poetree",
    "name": "poetree",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "poetry, corpus, versification",
    "author": "Petr Plechac",
    "author_email": "plechac@ucl.cas.cz",
    "download_url": "https://files.pythonhosted.org/packages/86/0d/ead18fc8cc5bbe3fca30b4666b4d5f0b83e1373c39272877b2dcb8a1733d/poetree-0.0.2.tar.gz",
    "platform": null,
    "description": "# poetree\nPoetree library provides an easy way to get data from PoeTree API.\n\n[PoeTree](https://versologie.cz/poetree) is a standardized collection of poetry corpora comprising over 330,000 poems in ten languages (Czech, English, French, German, Hungarian, Italian, Portuguese, Russian, Slovenian and Spanish). Each corpus has been deduplicated, enriched with Universal Dependencies, provided with additional metadata and converted into a unified JSON structure.\n\n## Install\n```console\npip install poetree\n```\nor\n```console\npip3 install poetree\n```\n\n## Usage\n```python\nimport poetree\n```\n\nThere are five classes available:\n- ```Poetree``` represents the entire PoeTree collection [\\[documentation\\]](https://versologie.cz/poetree/python-doc/poetree)\n- ```Corpus``` represents single corpus [\\[documentation\\]](https://versologie.cz/poetree/python-doc/corpus)\n- ```Author``` represents single author [\\[documentation\\]](https://versologie.cz/poetree/python-doc/author)\n- ```Source``` represents single book [\\[documentation\\]](https://versologie.cz/poetree/python-doc/source)\n- ```Poem``` represents single poem [\\[documentation\\]](https://versologie.cz/poetree/python-doc/poem)\n\nEach class has properties corresponding to the keys returned by API. To get for instance the number of authors and the number of poems in PoeTree.cs:\n\n```python\ncorpus = poetree.Corpus('cs')\nprint('number of authors:', corpus.n_authors)\nprint('number of poems:', corpus.n_poems)\n```\n```console\nnumber of authors: 606\nnumber of poems: 80229\n```\n\n###  get_corpora(), get_authors(), get_sources(), get_poems()\nClass instance may be created directly (as in the example above) or one may use a more general class to create a number of children instances by means of ```get_[something]()``` methods. To get for instance all the authors in PoeTree.en born between 1750 and 1760:\n\n```python\ncorpus = poetree.Corpus('en')\nfor author in corpus.get_authors(born_after=1750, born_before=1760):\n    print(f'{author.name} ({author.born})')\n```\n```console\nBarlow, Joel (1754)\nBlake, William (1757)\nCrabbe, George (1754)\nFreneau, Philip Morin (1752)\nRoberts, David (1757)\nWheatley, Phillis (1753)\n```\n\nAnalogous methods in other classes are as follow:\n\n<img src=\"https://versologie.cz/poetree/img/classes.png\" style=\"width:300px;\"/>\n\nObjects created by ```get_[something]()``` are also stored in their parent object's property ```data_[something]``` for later use:\n\n```python\ncorpus = poetree.Corpus('en')\ncorpus.get_authors(born_after=1750, born_before=1760)\nprint([x.name for x in corpus.content_['authors']])\n```\n```console\n['Barlow, Joel', 'Blake, William', 'Crabbe, George', 'Freneau, Philip Morin', 'Roberts, David', 'Wheatley, Phillis']\n```\n\nTo iterate over all corpora, over all authors, over all their poems:\n```python\npt = poetree.Poetree()\nfor corpus in pt.get_corpora():\n    for author in corpus.get_authors():\n        for poem in author.get_poems():\n            # Do stuff...\n```\n\n### metadata()\nEach class provides```metadata()``` method. By default it gives access to all metadata properties of the class:\n\n```python\ncorpus = poetree.Corpus('de')\nmetadata = corpus.metadata(output='pandas')\nprint(metadata)\n```\n```console\n  corpus                                               desc  n_authors  n_poems  n_lines  n_types  n_tokens\n0     de  Compiled by Klemens Bobenhausen and Benjamin H...        245    53133  1701234   678426  12482485\n```\n\nIt can also be used to get metadata of all children instances it created using the ```target``` attribute:\n```python\ncorpus = poetree.Corpus('pt')\ncorpus.get_authors()\ndf = corpus.metadata(target='authors', output='pandas') \nprint(df)\n```\n```console\n    id_                                     name                    viaf       wiki country  born  died  n_poems corpus\n0    20                      Alberto de Oliveira                29092555   Q1789301      br  1857  1937        1     pt\n1     6                   Ant\u00f4nio Gon\u00e7alves Dias                73988798    Q611997      br  1823  1864       15     pt\n2     1  Augusto de Carvalho Rodrigues dos Anjos                44342515    Q769887      br  1884  1914      306     pt\n3    11                          Bas\u00edlio da Gama                84492428   Q1789371      pt  1740  1795        6     pt\n4     4                  Cl\u00e1udio Manuel da Costa                41967264   Q1789904      br  1729  1789      190     pt\n5    25               Delminda Silveira de Sousa                99136109  Q10264874      br  1854  1932      548     pt\n6     8          Em\u00edlio Nunes Correia de Meneses                71253555  Q10272640      br  1866  1918       90     pt\n7    17               Francisco de S\u00e1 de Meneses                32338601   Q4403329      pt  1600  1664       12     pt\n8    15                   Gon\u00e7alves de Magalh\u00e3es                88876973   Q2532628      br  1811  1882       56     pt\n9    16                 Greg\u00f3rio de Matos Guerra               122323020    Q983565      br  1636  1696      704     pt\n10   13                Gustavo de Paula Teixeira                60882245  Q10293150      br  1881  1937      122     pt\n11   23               Jose\u0301 Pedro Xavier Pinheiro               121971768  Q15631816      br  1822  1882      100     pt\n12   14          Jos\u00e9 Joaquim Correia de Almeida                50626334  Q25859802      br  1820  1905      953     pt\n13    7                 Jos\u00e9 de Santa Rita Dur\u00e3o   730145601965601320827   Q6958165      br  1722  1784       10     pt\n14    5                     Jo\u00e3o da Cruz e Sousa                 4972695   Q2609059      br  1861  1898      407     pt\n15   10             Juv\u00eancio de Ara\u00fajo Figueredo  4738150325583310090002  Q10313264      br  1865  1927      438     pt\n16   24            Laurindo Jos\u00e9 da Silva Rabelo    65149066775765602956   Q6501838      br  1826  1864      184     pt\n17    9             Lu\u00eds Nicolau Fagundes Varela                24730690   Q2088965      br  1841  1875      215     pt\n18    3                       Lu\u00eds Vaz de Cam\u00f5es                34454091       Q590      pt  1524  1580       10     pt\n19    2  Manuel Maria Barbosa l'Hedois du Bocage                66536132    Q630116      pt  1765  1805      146     pt\n20   18             M\u00facio Scevola Lopes Teixeira                11159986  Q10334840      br  1857  1926       86     pt\n21   19             Nicolau Tolentino de Almeida                54191676    Q740967      pt  1740  1811      244     pt\n22   21   Olavo Br\u00e1s Martins dos Guimar\u00e3es Bilac                73898176    Q982354      br  1865  1918       75     pt\n23   22    Sebasti\u00e3o C\u00edcero dos Guimar\u00e3es Passos                25942646  Q10292895      br  1867  1909       75     pt\n24   12                    Tom\u00e1s Ant\u00f4nio Gonzaga                19760014   Q1334602      br  1744  1810      110     pt\n```\n\n### get_body(), get_all()\n\nWhen ```Poem``` instance is created, only its metadata are fetched from API. To get the body (lines, words and their annotation) one needs to call ```get_body()``` method first.\n\n```python\npoem = poetree.Poem(id_=1, lang='cs')\nbody = poem.get_body()\nprint(body[0])\n```\n```console\n{'id_': 1, 'id': 0, 'id_stanza': 1, 'text': 'Tv\u00e1 lo\u010f jde po vysok\u00e9m mo\u0159i,', 'part': False, 'words': [{'id_': 1, 'id': 1, 'id_sentence': 1, 'head': 2, 'deprel': 'det', 'form': 'Tv\u00e1', 'lemma': 'tv\u016fj', 'upos': 'DET', 'xpos': 'PSFS1-S1------1', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|PronType=Dem'}, {'id_': 2, 'id': 2, 'id_sentence': 1, 'head': 3, 'deprel': 'nsubj', 'form': 'lo\u010f', 'lemma': 'lo\u010f', 'upos': 'NOUN', 'xpos': 'NNFS1-----A----', 'feats': 'Case=Nom|Gender=Fem|Number=Sing|Polarity=Pos'}, {'id_': 3, 'id': 3, 'id_sentence': 1, 'head': 0, 'deprel': 'root', 'form': 'jde', 'lemma': 'j\u00edt', 'upos': 'VERB', 'xpos': 'VB-S---3P-AA---', 'feats': 'Mood=Ind|Number=Sing|Person=3|Polarity=Pos|Tense=Pres|VerbForm=Fin|Voice=Act'}, {'id_': 4, 'id': 4, 'id_sentence': 1, 'head': 6, 'deprel': 'case', 'form': 'po', 'lemma': 'po', 'upos': 'ADP', 'xpos': 'RR--6----------', 'feats': 'AdpType=Prep|Case=Loc'}, {'id_': 5, 'id': 5, 'id_sentence': 1, 'head': 6, 'deprel': 'amod', 'form': 'vysok\u00e9m', 'lemma': 'vysok\u00fd', 'upos': 'ADJ', 'xpos': 'AANS6----1A----', 'feats': 'Case=Loc|Degree=Pos|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 6, 'id': 6, 'id_sentence': 1, 'head': 3, 'deprel': 'obl', 'form': 'mo\u0159i', 'lemma': 'mo\u0159e', 'upos': 'NOUN', 'xpos': 'NNNS6-----A----', 'feats': 'Case=Loc|Gender=Neut|Number=Sing|Polarity=Pos'}, {'id_': 7, 'id': 7, 'id_sentence': 1, 'head': 9, 'deprel': 'punct', 'form': ',', 'lemma': ',', 'upos': 'PUNCT', 'xpos': 'Z:-------------', 'feats': '_'}]}\n```\n\nTo retrieve both the metadata and the body of the poem at the same time, there is a method ```get_all()```\n\n```python\npoem = poetree.Poem(id_=1, lang='cs')\nmetadata_and_body = poem.get_all()\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "An easy way to get data from PoeTree dataset",
    "version": "0.0.2",
    "project_urls": {
        "Download": "https://github.com/versotym/poetree/archive/refs/tags/0.0.2.tar.gz",
        "Homepage": "https://github.com/versotym/poetree"
    },
    "split_keywords": [
        "poetry",
        " corpus",
        " versification"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "860dead18fc8cc5bbe3fca30b4666b4d5f0b83e1373c39272877b2dcb8a1733d",
                "md5": "e234410f22f410c83802639cdfb3abc7",
                "sha256": "c1e75e210ca03c318305074f019bcd532e98ef6b12b9cee351d4ab2fcca1d9d0"
            },
            "downloads": -1,
            "filename": "poetree-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "e234410f22f410c83802639cdfb3abc7",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 13572,
            "upload_time": "2024-04-02T14:12:59",
            "upload_time_iso_8601": "2024-04-02T14:12:59.237540Z",
            "url": "https://files.pythonhosted.org/packages/86/0d/ead18fc8cc5bbe3fca30b4666b4d5f0b83e1373c39272877b2dcb8a1733d/poetree-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-02 14:12:59",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "versotym",
    "github_project": "poetree",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "poetree"
}
        
Elapsed time: 0.87627s