# Perdido Geoparser Python library
[![PyPI](https://img.shields.io/pypi/v/perdido)](https://pypi.org/project/perdido)
[![PyPI - License](https://img.shields.io/pypi/l/perdido?color=yellow)](https://github.com/ludovicmoncla/perdido/blob/main/LICENSE)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/perdido)
## Installation
To install the latest stable version, you can use:
```bash
pip install --upgrade perdido
```
## Quick start
### Geoparsing
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geoparser.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)
#### Import
```python
from perdido.geoparser import Geoparser
```
#### Run geoparser
```python
text = "J'ai rendez-vous proche de la place Bellecour, de la place des Célestins, au sud de la fontaine des Jacobins et près du pont Bonaparte."
geoparser = Geoparser()
doc = geoparser(text)
```
Some parameters can be set when initializing the `Geoparser` object:
* `version`: *Standard* (default), *Encyclopedie*
* `pos_tagger`: *spacy* (default), *stanza*, and *treetagger*
#### Get tokens
* Access token attributes (text, lemma and [UPOS](https://universaldependencies.org/u/pos/) part-of-speech tag):
```python
for token in doc:
print(f'{token.text}\tlemma: {token.lemma}\tpos: {token.pos}')
```
* Get the IOB format:
```python
for token in doc:
print(token.iob_format())
```
* Get a TSV-IOB format:
```python
for token in doc:
print(token.tsv_format())
```
#### Print the XML-TEI output
```python
print(doc.tei)
```
#### Print the XML-TEI output with XML syntax highlighting
```python
from display_xml import XML
XML(doc.tei, style='lovelace')
```
#### Print the GeoJSON output
```python
print(doc.geojson)
```
#### Get the list of named entities
```python
for entity in doc.named_entities:
print(f'entity: {entity.text}\ttag: {entity.tag}')
if entity.tag == 'place':
for t in entity.toponym_candidates:
print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')
```
#### Get the list of nested named entities
```python
for nested_entity in doc.nested_named_entities:
print(f'entity: {nested_entity.text}\ttag: {nested_entity.tag}')
if nested_entity.tag == 'place':
for t in nested_entity.toponym_candidates:
print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')
```
#### Get the list of spatial relations
```python
for sp_relation in doc.sp_relations:
print(f'spatial relation: {sp_relation.text}\ttag: {sp_relation.tag}')
```
#### Shows named entities and nested named entities using the displacy library from spaCy
```python
displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True)
```
```python
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)
```
#### Display the map (using folium library)
```python
doc.get_folium_map()
```
#### Saving results
```python
doc.to_xml('filename.xml')
```
```python
doc.to_geojson('filename.geojson')
```
```python
doc.to_iob('filename.tsv')
```
```python
doc.to_csv('filename.csv')
```
### Geocoding
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geocoder.ipynb)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)
#### Import
```python
from perdido.geocoder import Geocoder
```
#### Geocode a single place name
```python
geocoder = Geocoder()
doc = geocoder('Lyon')
```
Some parameters can be set when initializing the `Geocoder` object:
* `sources`:
* `max_rows`:
* `country_code`:
* `bbox`:
#### Geocode a list of place names
```python
geocoder = Geocoder()
doc = geocoder(['Lyon', 'la place des Célestins', 'la fontaine des Jacobins'])
```
#### Get the geojson result
```python
print(doc.geojson)
```
#### Get the list of toponym candidates
```python
for t in doc.toponyms:
print(f'lat: {t.lat}\tlng: {t.lng}\tsource {t.source}\tsourceName {t.source_name}')
```
#### Get the toponym candidates as a GeoDataframe
```python
print(doc.to_geodataframe())
```
# Perdido Geoparser REST APIs
[http://choucas.univ-pau.fr/docs#](http://choucas.univ-pau.fr/docs#/)
## Example: call REST API in Python
```python
import requests
url = 'http://choucas.univ-pau.fr/PERDIDO/api/'
service = 'geoparsing'
data = {'content': 'Je visite la ville de Lyon, Annecy et le Mont-Blanc.'}
parameters = {'api_key': 'demo'}
r = requests.post(url+service, params=parameters, json=data)
print(r.text)
```
# Tutorials
- [Perdido: Python library for geoparsing and geocoding French texts](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/perdido-geoparser-GeoExT-ECIR23.ipynb): presented at the 1st GeoExT International Workshop at the ECIR 2023 conference.
- [Perdido Geoparser tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)
- [Perdido Geocoder tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)
- [Perdido Web services tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_WebServices.ipynb)
# Cite this work
> Moncla, L. and Gaio, M. (2023). Perdido: Python library for geoparsing and geocoding French texts. In proceedings of the First International Workshop on Geographic Information Extraction from Texts (GeoExT'23), ECIR Conference, Dublin, Ireland.
# Acknowledgements
``Perdido`` is an active project still under developpement.
This work was partially supported by the following projects:
* [GEODE](https://geode-project.github.io) (2020-2024): [LabEx ASLAN](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081)
* [GeoDISCO](https://www.msh-lse.fr/projets/geodisco/) (2019-2020): [MSH Lyon St-Etienne](https://www.msh-lse.fr) (ANR‐16‐IDEX‐0005)
* [CHOUCAS](http://choucas.ign.fr) (2017-2022): [ANR](https://anr.fr/Projet-ANR-16-CE23-0018) (ANR-16-CE23-0018)
* [PERDIDO](http://erig.univ-pau.fr/PERDIDO/) (2012-2015): [CDAPP](https://www.pau.fr/) and [IGN](https://www.ign.fr)
Raw data
{
"_id": null,
"home_page": "https://github.com/ludovicmoncla/perdido",
"name": "perdido",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "geoparsing named-entity-recognition geographic-information-retrieval toponym-resolution toponym-disambiguation",
"author": "Ludovic Moncla",
"author_email": "moncla.ludovic@gmail.com",
"download_url": "https://files.pythonhosted.org/packages/dd/de/c464aec0dc05fac1e98449403fedbe60c547361b321df65393596908515b/perdido-0.1.49.tar.gz",
"platform": null,
"description": "# Perdido Geoparser Python library\n\n\n[![PyPI](https://img.shields.io/pypi/v/perdido)](https://pypi.org/project/perdido)\n[![PyPI - License](https://img.shields.io/pypi/l/perdido?color=yellow)](https://github.com/ludovicmoncla/perdido/blob/main/LICENSE)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/perdido)\n\n\n\n## Installation\n\nTo install the latest stable version, you can use:\n```bash\npip install --upgrade perdido\n```\n\n\n## Quick start\n\n\n### Geoparsing\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geoparser.ipynb)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)\n\n#### Import\n\n```python\nfrom perdido.geoparser import Geoparser\n```\n\n#### Run geoparser\n\n```python\ntext = \"J'ai rendez-vous proche de la place Bellecour, de la place des C\u00e9lestins, au sud de la fontaine des Jacobins et pr\u00e8s du pont Bonaparte.\"\ngeoparser = Geoparser()\ndoc = geoparser(text)\n```\n\nSome parameters can be set when initializing the `Geoparser` object:\n\n* `version`: *Standard* (default), *Encyclopedie*\n* `pos_tagger`: *spacy* (default), *stanza*, and *treetagger*\n\n\n#### Get tokens\n\n* Access token attributes (text, lemma and [UPOS](https://universaldependencies.org/u/pos/) part-of-speech tag):\n\n```python\nfor token in doc:\n print(f'{token.text}\\tlemma: {token.lemma}\\tpos: {token.pos}')\n```\n\n* Get the IOB format:\n\n```python\nfor token in doc:\n print(token.iob_format())\n```\n\n* Get a TSV-IOB format:\n\n```python\nfor token in doc:\n print(token.tsv_format())\n```\n\n#### Print the XML-TEI output\n\n```python\nprint(doc.tei)\n```\n\n#### Print the XML-TEI output with XML syntax highlighting\n\n```python\nfrom display_xml import XML\nXML(doc.tei, style='lovelace')\n```\n\n#### Print the GeoJSON output\n\n```python\nprint(doc.geojson)\n```\n\n#### Get the list of named entities\n\n```python\nfor entity in doc.named_entities:\n print(f'entity: {entity.text}\\ttag: {entity.tag}')\n if entity.tag == 'place':\n for t in entity.toponym_candidates:\n print(f' latitude: {t.lat}\\tlongitude: {t.lng}\\tsource {t.source}')\n```\n\n#### Get the list of nested named entities\n\n```python\nfor nested_entity in doc.nested_named_entities:\n print(f'entity: {nested_entity.text}\\ttag: {nested_entity.tag}')\n if nested_entity.tag == 'place':\n for t in nested_entity.toponym_candidates:\n print(f' latitude: {t.lat}\\tlongitude: {t.lng}\\tsource {t.source}')\n```\n\n#### Get the list of spatial relations\n\n```python\nfor sp_relation in doc.sp_relations:\n print(f'spatial relation: {sp_relation.text}\\ttag: {sp_relation.tag}')\n```\n\n#### Shows named entities and nested named entities using the displacy library from spaCy\n\n```python\ndisplacy.render(doc.to_spacy_doc(), style=\"ent\", jupyter=True)\n```\n\n```python\ndisplacy.render(doc.to_spacy_doc(), style=\"span\", jupyter=True)\n```\n\n#### Display the map (using folium library)\n```python\ndoc.get_folium_map()\n```\n\n#### Saving results\n\n```python\ndoc.to_xml('filename.xml')\n```\n\n```python\ndoc.to_geojson('filename.geojson')\n```\n\n```python\ndoc.to_iob('filename.tsv')\n```\n\n```python\ndoc.to_csv('filename.csv')\n```\n\n### Geocoding\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ludovicmoncla/perdido/main?labpath=notebooks%2Fdemo_Geocoder.ipynb)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)\n\n#### Import\n\n```python\nfrom perdido.geocoder import Geocoder\n```\n\n#### Geocode a single place name\n\n```python\ngeocoder = Geocoder()\ndoc = geocoder('Lyon')\n```\n\nSome parameters can be set when initializing the `Geocoder` object:\n\n* `sources`: \n* `max_rows`: \n* `country_code`: \n* `bbox`: \n\n#### Geocode a list of place names\n\n```python\ngeocoder = Geocoder()\ndoc = geocoder(['Lyon', 'la place des C\u00e9lestins', 'la fontaine des Jacobins'])\n```\n\n#### Get the geojson result\n\n```python\nprint(doc.geojson)\n```\n\n#### Get the list of toponym candidates\n\n```python\nfor t in doc.toponyms: \n print(f'lat: {t.lat}\\tlng: {t.lng}\\tsource {t.source}\\tsourceName {t.source_name}')\n```\n\n#### Get the toponym candidates as a GeoDataframe\n\n```python\nprint(doc.to_geodataframe())\n```\n\n\n\n\n# Perdido Geoparser REST APIs\n\n[http://choucas.univ-pau.fr/docs#](http://choucas.univ-pau.fr/docs#/)\n\n\n## Example: call REST API in Python\n\n```python\nimport requests\n\nurl = 'http://choucas.univ-pau.fr/PERDIDO/api/'\nservice = 'geoparsing'\ndata = {'content': 'Je visite la ville de Lyon, Annecy et le Mont-Blanc.'}\nparameters = {'api_key': 'demo'}\n\nr = requests.post(url+service, params=parameters, json=data)\n\nprint(r.text)\n```\n\n\n\n# Tutorials\n\n- [Perdido: Python library for geoparsing and geocoding French texts](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/perdido-geoparser-GeoExT-ECIR23.ipynb): presented at the 1st GeoExT International Workshop at the ECIR 2023 conference.\n- [Perdido Geoparser tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)\n- [Perdido Geocoder tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_Geocoder.ipynb)\n- [Perdido Web services tutorial](https://github.com/ludovicmoncla/perdido/blob/main/notebooks/demo_WebServices.ipynb)\n\n\n\n# Cite this work\n\n> Moncla, L. and Gaio, M. (2023). Perdido: Python library for geoparsing and geocoding French texts. In proceedings of the First International Workshop on Geographic Information Extraction from Texts (GeoExT'23), ECIR Conference, Dublin, Ireland.\n\n\n\n# Acknowledgements\n\n``Perdido`` is an active project still under developpement.\n\nThis work was partially supported by the following projects:\n* [GEODE](https://geode-project.github.io) (2020-2024): [LabEx ASLAN](https://aslan.universite-lyon.fr) (ANR-10-LABX-0081)\n* [GeoDISCO](https://www.msh-lse.fr/projets/geodisco/) (2019-2020): [MSH Lyon St-Etienne](https://www.msh-lse.fr) (ANR\u201016\u2010IDEX\u20100005)\n* [CHOUCAS](http://choucas.ign.fr) (2017-2022): [ANR](https://anr.fr/Projet-ANR-16-CE23-0018) (ANR-16-CE23-0018)\n* [PERDIDO](http://erig.univ-pau.fr/PERDIDO/) (2012-2015): [CDAPP](https://www.pau.fr/) and [IGN](https://www.ign.fr)\n",
"bugtrack_url": null,
"license": "BSD-Clause-2",
"summary": "PERDIDO Geoparser python library",
"version": "0.1.49",
"project_urls": {
"Homepage": "https://github.com/ludovicmoncla/perdido"
},
"split_keywords": [
"geoparsing",
"named-entity-recognition",
"geographic-information-retrieval",
"toponym-resolution",
"toponym-disambiguation"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "f64142b1d2d7f5b91f0b0f1f6bd9d6ed05a31ddade3b976b073fc47e7f47ee02",
"md5": "b04cea9df28c84a31938abbb026cbbe3",
"sha256": "5b62adf1c397e172f36a89d8f419d010c493a99728366c5c139ddc45eb868e51"
},
"downloads": -1,
"filename": "perdido-0.1.49-py3-none-any.whl",
"has_sig": false,
"md5_digest": "b04cea9df28c84a31938abbb026cbbe3",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 98627110,
"upload_time": "2023-09-18T06:29:37",
"upload_time_iso_8601": "2023-09-18T06:29:37.939402Z",
"url": "https://files.pythonhosted.org/packages/f6/41/42b1d2d7f5b91f0b0f1f6bd9d6ed05a31ddade3b976b073fc47e7f47ee02/perdido-0.1.49-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dddec464aec0dc05fac1e98449403fedbe60c547361b321df65393596908515b",
"md5": "15e7459a48e438275638f6dd8aa67def",
"sha256": "8f1fc16a05fbce22b83f7aa13155620bc93135ae614fec65e90bb6f7e3075f87"
},
"downloads": -1,
"filename": "perdido-0.1.49.tar.gz",
"has_sig": false,
"md5_digest": "15e7459a48e438275638f6dd8aa67def",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 61292238,
"upload_time": "2023-09-18T06:29:45",
"upload_time_iso_8601": "2023-09-18T06:29:45.525024Z",
"url": "https://files.pythonhosted.org/packages/dd/de/c464aec0dc05fac1e98449403fedbe60c547361b321df65393596908515b/perdido-0.1.49.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-09-18 06:29:45",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ludovicmoncla",
"github_project": "perdido",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"requirements": [],
"lcname": "perdido"
}