perdido


Nameperdido JSON
Version 0.1.49 PyPI version JSON
download
home_pagehttps://github.com/ludovicmoncla/perdido
SummaryPERDIDO Geoparser python library
upload_time2023-09-18 06:29:45
maintainer
docs_urlNone
authorLudovic Moncla
requires_python
licenseBSD-Clause-2
keywords geoparsing named-entity-recognition geographic-information-retrieval toponym-resolution toponym-disambiguation
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Perdido Geoparser Python library


[![PyPI](https://img.shields.io/pypi/v/perdido)](https://pypi.org/project/perdido)
[![PyPI - License](https://img.shields.io/pypi/l/perdido?color=yellow)](https://github.com/ludovicmoncla/perdido/blob/main/LICENSE)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/perdido)



## Installation

To install the latest stable version, you can use:
```bash
pip install --upgrade perdido
```


## Quick start


### Geoparsing

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geoparser.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)

#### Import

```python
from perdido.geoparser import Geoparser
```

#### Run geoparser

```python
text = "J'ai rendez-vous proche de la place Bellecour, de la place des Célestins, au sud de la fontaine des Jacobins et près du pont Bonaparte."
geoparser = Geoparser()
doc = geoparser(text)
```

Some parameters can be set when initializing the `Geoparser` object:

* `version`: *Standard* (default), *Encyclopedie*
* `pos_tagger`: *spacy* (default), *stanza*, and *treetagger*


#### Get tokens

* Access token attributes (text, lemma and [UPOS](https://universaldependencies.org/u/pos/) part-of-speech tag):

```python
for token in doc:
    print(f'{token.text}\tlemma: {token.lemma}\tpos: {token.pos}')
```

* Get the IOB format:

```python
for token in doc:
    print(token.iob_format())
```

* Get a TSV-IOB format:

```python
for token in doc:
    print(token.tsv_format())
```

#### Print the XML-TEI output

```python
print(doc.tei)
```

#### Print the XML-TEI output with XML syntax highlighting

```python
from display_xml import XML
XML(doc.tei, style='lovelace')
```

#### Print the GeoJSON output

```python
print(doc.geojson)
```

#### Get the list of named entities

```python
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')
```

#### Get the list of nested named entities

```python
for nested_entity in doc.nested_named_entities:
    print(f'entity: {nested_entity.text}\ttag: {nested_entity.tag}')
    if nested_entity.tag == 'place':
        for t in nested_entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')
```

#### Get the list of spatial relations

```python
for sp_relation in doc.sp_relations:
    print(f'spatial relation: {sp_relation.text}\ttag: {sp_relation.tag}')
```

#### Shows named entities and nested named entities using the displacy library from spaCy

```python
displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True)
```

```python
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)
```

#### Display the map (using folium library)
```python
doc.get_folium_map()
```

#### Saving results

```python
doc.to_xml('filename.xml')
```

```python
doc.to_geojson('filename.geojson')
```

```python
doc.to_iob('filename.tsv')
```

```python
doc.to_csv('filename.csv')
```

### Geocoding

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geocoder.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)

#### Import

```python
from perdido.geocoder import Geocoder
```

#### Geocode a single place name

```python
geocoder = Geocoder()
doc = geocoder('Lyon')
```

Some parameters can be set when initializing the `Geocoder` object:

* `sources`: 
* `max_rows`: 
* `country_code`: 
* `bbox`: 

#### Geocode a list of place names

```python
geocoder = Geocoder()
doc = geocoder(['Lyon', 'la place des Célestins', 'la fontaine des Jacobins'])
```

#### Get the geojson result

```python
print(doc.geojson)
```

#### Get the list of toponym candidates

```python
for t in doc.toponyms: 
    print(f'lat: {t.lat}\tlng: {t.lng}\tsource {t.source}\tsourceName {t.source_name}')
```

#### Get the toponym candidates as a GeoDataframe

```python
print(doc.to_geodataframe())
```




# Perdido Geoparser REST APIs

[http://choucas.univ-pau.fr/docs#](http://choucas.univ-pau.fr/docs#/)


## Example: call REST API in Python

```python
import requests

url = 'http://choucas.univ-pau.fr/PERDIDO/api/'
service = 'geoparsing'
data = {'content': 'Je visite la ville de Lyon, Annecy et le Mont-Blanc.'}
parameters = {'api_key': 'demo'}

r = requests.post(url+service, params=parameters, json=data)

print(r.text)
```



# Tutorials

- [Perdido: Python library for geoparsing and geocoding French texts](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/perdido-geoparser-GeoExT-ECIR23.ipynb): presented at the 1st GeoExT International Workshop at the ECIR 2023 conference.
- [Perdido Geoparser tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)
- [Perdido Geocoder tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)
- [Perdido Web services tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_WebServices.ipynb)



# Cite this work

> Moncla, L. and Gaio, M. (2023). Perdido: Python library for geoparsing and geocoding French texts. In proceedings of the First International Workshop on Geographic Information Extraction from Texts (GeoExT'23), ECIR Conference, Dublin, Ireland.



# Acknowledgements

``Perdido`` is an active project still under developpement.

This work was partially supported by the following projects:
* [GEODE](https://geode-project.github.io) (2020-2024): [LabEx ASLAN](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081)
* [GeoDISCO](https://www.msh-lse.fr/projets/geodisco/) (2019-2020): [MSH Lyon St-Etienne](https://www.msh-lse.fr) (ANR‐16‐IDEX‐0005)
* [CHOUCAS](http://choucas.ign.fr) (2017-2022): [ANR](https://anr.fr/Projet-ANR-16-CE23-0018) (ANR-16-CE23-0018)
* [PERDIDO](http://erig.univ-pau.fr/PERDIDO/) (2012-2015): [CDAPP](https://www.pau.fr/) and [IGN](https://www.ign.fr)

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/ludovicmoncla/perdido",
    "name": "perdido",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "geoparsing named-entity-recognition geographic-information-retrieval toponym-resolution toponym-disambiguation",
    "author": "Ludovic Moncla",
    "author_email": "moncla.ludovic@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/dd/de/c464aec0dc05fac1e98449403fedbe60c547361b321df65393596908515b/perdido-0.1.49.tar.gz",
    "platform": null,
    "description": "# Perdido Geoparser Python library\n\n\n[![PyPI](https://img.shields.io/pypi/v/perdido)](https://pypi.org/project/perdido)\n[![PyPI - License](https://img.shields.io/pypi/l/perdido?color=yellow)](https://github.com/ludovicmoncla/perdido/blob/main/LICENSE)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/perdido)\n\n\n\n## Installation\n\nTo install the latest stable version, you can use:\n```bash\npip install --upgrade perdido\n```\n\n\n## Quick start\n\n\n### Geoparsing\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geoparser.ipynb)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)\n\n#### Import\n\n```python\nfrom perdido.geoparser import Geoparser\n```\n\n#### Run geoparser\n\n```python\ntext = \"J'ai rendez-vous proche de la place Bellecour, de la place des C\u00e9lestins, au sud de la fontaine des Jacobins et pr\u00e8s du pont Bonaparte.\"\ngeoparser = Geoparser()\ndoc = geoparser(text)\n```\n\nSome parameters can be set when initializing the `Geoparser` object:\n\n* `version`: *Standard* (default), *Encyclopedie*\n* `pos_tagger`: *spacy* (default), *stanza*, and *treetagger*\n\n\n#### Get tokens\n\n* Access token attributes (text, lemma and [UPOS](https://universaldependencies.org/u/pos/) part-of-speech tag):\n\n```python\nfor token in doc:\n    print(f'{token.text}\\tlemma: {token.lemma}\\tpos: {token.pos}')\n```\n\n* Get the IOB format:\n\n```python\nfor token in doc:\n    print(token.iob_format())\n```\n\n* Get a TSV-IOB format:\n\n```python\nfor token in doc:\n    print(token.tsv_format())\n```\n\n#### Print the XML-TEI output\n\n```python\nprint(doc.tei)\n```\n\n#### Print the XML-TEI output with XML syntax highlighting\n\n```python\nfrom display_xml import XML\nXML(doc.tei, style='lovelace')\n```\n\n#### Print the GeoJSON output\n\n```python\nprint(doc.geojson)\n```\n\n#### Get the list of named entities\n\n```python\nfor entity in doc.named_entities:\n    print(f'entity: {entity.text}\\ttag: {entity.tag}')\n    if entity.tag == 'place':\n        for t in entity.toponym_candidates:\n            print(f' latitude: {t.lat}\\tlongitude: {t.lng}\\tsource {t.source}')\n```\n\n#### Get the list of nested named entities\n\n```python\nfor nested_entity in doc.nested_named_entities:\n    print(f'entity: {nested_entity.text}\\ttag: {nested_entity.tag}')\n    if nested_entity.tag == 'place':\n        for t in nested_entity.toponym_candidates:\n            print(f' latitude: {t.lat}\\tlongitude: {t.lng}\\tsource {t.source}')\n```\n\n#### Get the list of spatial relations\n\n```python\nfor sp_relation in doc.sp_relations:\n    print(f'spatial relation: {sp_relation.text}\\ttag: {sp_relation.tag}')\n```\n\n#### Shows named entities and nested named entities using the displacy library from spaCy\n\n```python\ndisplacy.render(doc.to_spacy_doc(), style=\"ent\", jupyter=True)\n```\n\n```python\ndisplacy.render(doc.to_spacy_doc(), style=\"span\", jupyter=True)\n```\n\n#### Display the map (using folium library)\n```python\ndoc.get_folium_map()\n```\n\n#### Saving results\n\n```python\ndoc.to_xml('filename.xml')\n```\n\n```python\ndoc.to_geojson('filename.geojson')\n```\n\n```python\ndoc.to_iob('filename.tsv')\n```\n\n```python\ndoc.to_csv('filename.csv')\n```\n\n### Geocoding\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geocoder.ipynb)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)\n\n#### Import\n\n```python\nfrom perdido.geocoder import Geocoder\n```\n\n#### Geocode a single place name\n\n```python\ngeocoder = Geocoder()\ndoc = geocoder('Lyon')\n```\n\nSome parameters can be set when initializing the `Geocoder` object:\n\n* `sources`: \n* `max_rows`: \n* `country_code`: \n* `bbox`: \n\n#### Geocode a list of place names\n\n```python\ngeocoder = Geocoder()\ndoc = geocoder(['Lyon', 'la place des C\u00e9lestins', 'la fontaine des Jacobins'])\n```\n\n#### Get the geojson result\n\n```python\nprint(doc.geojson)\n```\n\n#### Get the list of toponym candidates\n\n```python\nfor t in doc.toponyms: \n    print(f'lat: {t.lat}\\tlng: {t.lng}\\tsource {t.source}\\tsourceName {t.source_name}')\n```\n\n#### Get the toponym candidates as a GeoDataframe\n\n```python\nprint(doc.to_geodataframe())\n```\n\n\n\n\n# Perdido Geoparser REST APIs\n\n[http://choucas.univ-pau.fr/docs#](http://choucas.univ-pau.fr/docs#/)\n\n\n## Example: call REST API in Python\n\n```python\nimport requests\n\nurl = 'http://choucas.univ-pau.fr/PERDIDO/api/'\nservice = 'geoparsing'\ndata = {'content': 'Je visite la ville de Lyon, Annecy et le Mont-Blanc.'}\nparameters = {'api_key': 'demo'}\n\nr = requests.post(url+service, params=parameters, json=data)\n\nprint(r.text)\n```\n\n\n\n# Tutorials\n\n- [Perdido: Python library for geoparsing and geocoding French texts](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/perdido-geoparser-GeoExT-ECIR23.ipynb): presented at the 1st GeoExT International Workshop at the ECIR 2023 conference.\n- [Perdido Geoparser tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)\n- [Perdido Geocoder tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)\n- [Perdido Web services tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_WebServices.ipynb)\n\n\n\n# Cite this work\n\n> Moncla, L. and Gaio, M. (2023). Perdido: Python library for geoparsing and geocoding French texts. In proceedings of the First International Workshop on Geographic Information Extraction from Texts (GeoExT'23), ECIR Conference, Dublin, Ireland.\n\n\n\n# Acknowledgements\n\n``Perdido`` is an active project still under developpement.\n\nThis work was partially supported by the following projects:\n* [GEODE](https://geode-project.github.io) (2020-2024): [LabEx ASLAN](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081)\n* [GeoDISCO](https://www.msh-lse.fr/projets/geodisco/) (2019-2020): [MSH Lyon St-Etienne](https://www.msh-lse.fr) (ANR\u201016\u2010IDEX\u20100005)\n* [CHOUCAS](http://choucas.ign.fr) (2017-2022): [ANR](https://anr.fr/Projet-ANR-16-CE23-0018) (ANR-16-CE23-0018)\n* [PERDIDO](http://erig.univ-pau.fr/PERDIDO/) (2012-2015): [CDAPP](https://www.pau.fr/) and [IGN](https://www.ign.fr)\n",
    "bugtrack_url": null,
    "license": "BSD-Clause-2",
    "summary": "PERDIDO Geoparser python library",
    "version": "0.1.49",
    "project_urls": {
        "Homepage": "https://github.com/ludovicmoncla/perdido"
    },
    "split_keywords": [
        "geoparsing",
        "named-entity-recognition",
        "geographic-information-retrieval",
        "toponym-resolution",
        "toponym-disambiguation"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f64142b1d2d7f5b91f0b0f1f6bd9d6ed05a31ddade3b976b073fc47e7f47ee02",
                "md5": "b04cea9df28c84a31938abbb026cbbe3",
                "sha256": "5b62adf1c397e172f36a89d8f419d010c493a99728366c5c139ddc45eb868e51"
            },
            "downloads": -1,
            "filename": "perdido-0.1.49-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "b04cea9df28c84a31938abbb026cbbe3",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 98627110,
            "upload_time": "2023-09-18T06:29:37",
            "upload_time_iso_8601": "2023-09-18T06:29:37.939402Z",
            "url": "https://files.pythonhosted.org/packages/f6/41/42b1d2d7f5b91f0b0f1f6bd9d6ed05a31ddade3b976b073fc47e7f47ee02/perdido-0.1.49-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dddec464aec0dc05fac1e98449403fedbe60c547361b321df65393596908515b",
                "md5": "15e7459a48e438275638f6dd8aa67def",
                "sha256": "8f1fc16a05fbce22b83f7aa13155620bc93135ae614fec65e90bb6f7e3075f87"
            },
            "downloads": -1,
            "filename": "perdido-0.1.49.tar.gz",
            "has_sig": false,
            "md5_digest": "15e7459a48e438275638f6dd8aa67def",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 61292238,
            "upload_time": "2023-09-18T06:29:45",
            "upload_time_iso_8601": "2023-09-18T06:29:45.525024Z",
            "url": "https://files.pythonhosted.org/packages/dd/de/c464aec0dc05fac1e98449403fedbe60c547361b321df65393596908515b/perdido-0.1.49.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-18 06:29:45",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "ludovicmoncla",
    "github_project": "perdido",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "requirements": [],
    "lcname": "perdido"
}
        
Elapsed time: 0.17172s