geograpy3


Namegeograpy3 JSON
Version 0.2.7 PyPI version JSON
download
home_pagehttps://github.com/somnathrakshit/geograpy3
SummaryExtract countries, regions and cities from a URL or text
upload_time2023-09-29 08:23:46
maintainer
docs_urlNone
authorSomnath Rakshit
requires_python
licenseApache
keywords
VCS
bugtrack_url
requirements newspaper3k nltk jellyfish numpy pylodstorage sphinx-rtd-theme scikit-learn pandas geopy OSMPythonTools
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # geograpy3
[![Join the discussion at https://github.com/somnathrakshit/geograpy3/discussions](https://shields.io/badge/GitHub-%20Discussions-blue?logo=github)](https://github.com/somnathrakshit/geograpy3/discussions)
[![Documentation Status](https://readthedocs.org/projects/geograpy3/badge/?version=latest)](https://geograpy3.readthedocs.io/en/latest/?badge=latest)
[![pypi](https://img.shields.io/pypi/pyversions/geograpy3)](https://pypi.org/project/geograpy3/)
[![Github Actions Build](https://github.com/somnathrakshit/geograpy3/workflows/Build/badge.svg?branch=master)](https://github.com/somnathrakshit/geograpy3/actions?query=workflow%3ABuild+branch%3Amaster)
[![PyPI Status](https://img.shields.io/pypi/v/geograpy3.svg)](https://pypi.python.org/pypi/geograpy3/)
[![Downloads](https://pepy.tech/badge/geograpy3)](https://pepy.tech/project/geograpy3)
[![GitHub issues](https://img.shields.io/github/issues/somnathrakshit/geograpy3.svg)](https://github.com/somnathrakshit/geograpy3/issues)
[![GitHub closed issues](https://img.shields.io/github/issues-closed/somnathrakshit/geograpy3.svg)](https://github.com/somnathrakshit/geograpy3/issues/?q=is%3Aissue+is%3Aclosed)
[![License](https://img.shields.io/github/license/somnathrakshit/geograpy3.svg)](https://www.apache.org/licenses/LICENSE-2.0)

geograpy3 is a fork of [geograpy2](https://github.com/Corollarium/geograpy2), which is itself a fork of [geograpy](https://github.com/ushahidi/geograpy) and inherits most of it, but solves several problems (such as support for utf8, places names
with multiple words, confusion over homonyms etc). Also, geograpy3 is compatible with Python 3, unlike geograpy2.

since geograpy3 0.0.2 cities,countries and regions are matched against a database derived from the corresponding wikidata entries

What it is
==========

geograpy extracts place names from a URL or text, and adds context to those names -- for example distinguishing between a country, region or city.

The extraction is a two step process. The first process is a Natural Language Processing task which analyzes a text for potential mentions of geographic locations. In the next step the words which represent such locations are looked up using the Locator.

If you already know that your content has geographic information you might want to use the Locator interface directly.

## Examples/Tutorial
* [see Examples/Tutorial Wiki](http://wiki.bitplan.com/index.php/Geograpy#Examples)

## Install & Setup

Grab the package using `pip` (this will take a few minutes)
```bash
pip install geograpy3
```

geograpy3 uses [NLTK](http://www.nltk.org/) for entity recognition, so you'll also need
to download the models we're using. Fortunately there's a command that'll take
care of this for you.
```bash
geograpy-nltk
```

## Getting the source code
```bash
git clone https://github.com/somnathrakshit/geograpy3
cd geograpy3
scripts/install
```

## Basic Usage

Import the module, give some text or a URL, and presto.
```python
import geograpy
url = 'https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay'
places = geograpy.get_geoPlace_context(url=url)
```

Now you have access to information about all the places mentioned in the linked
article.

* `places.countries` _contains a list of country names_
* `places.regions` _contains a list of region names_
* `places.cities` _contains a list of city names_
* `places.other` _lists everything that wasn't clearly a country, region or city_

Note that the `other` list might be useful for shorter texts, to pull out
information like street names, points of interest, etc, but at the moment is
a bit messy when scanning longer texts that contain possessive forms of proper
nouns (like "Russian" instead of "Russia").

## But Wait, There's More

In addition to listing the names of discovered places, you'll also get some
information about the relationships between places.

* `places.country_regions` _regions broken down by country_
* `places.country_cities` _cities broken down by country_
* `places.address_strings` _city, region, country strings useful for geocoding_

## Last But Not Least

While a text might mention many places, it's probably focused on one or two, so
geograpy3 also breaks down countries, regions and cities by number of mentions.

* `places.country_mentions`
* `places.region_mentions`
* `places.city_mentions`

Each of these returns a list of tuples. The first item in the tuple is the place
name and the second item is the number of mentions. For example:

    [('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]  

## If You're Really Serious

You can of course use each of Geograpy's modules on their own. For example:
```python
from geograpy import extraction

e = extraction.Extractor(url='https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay')
e.find_geoEntities()

# You can now access all of the places found by the Extractor
print(e.places)
```

Place context is handled in the `places` module. For example:

```python
from geograpy import places

pc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])

pc.set_countries()
print pc.countries #['United States']

pc.set_regions()
print(pc.regions #['Ohio'])

pc.set_cities()
print(pc.cities #['Cleveland'])

print(pc.address_strings #['Cleveland, Ohio, United States'])
```

And of course all of the other information shown above (`country_regions` etc)
is available after the corresponding `set_` method is called.

## Stackoverflow
* [Questions tagged with 'geograpy'](https://stackoverflow.com/questions/tagged/geograpy)

## Credits

geograpy3 uses the following excellent libraries:

* [NLTK](http://www.nltk.org/) for entity recognition
* [newspaper](https://github.com/codelucas/newspaper) for text extraction from HTML
* [jellyfish](https://github.com/sunlightlabs/jellyfish) for fuzzy text match
* [pylodstorage](https://pypi.org/project/pylodstorage/) for storage and retrieval of tabular data from SQL and SPARQL sources

geograpy3 uses the following data sources:
* [ISO3166ErrorDictionary](https://github.com/bodacea/countryname/blob/master/countryname/databases/ISO3166ErrorDictionary.csv) for common country mispellings _via [Sara-Jayne Terp](https://github.com/bodacea)_
* [Wikidata](https://www.wikidata.org) for country/region/city information with disambiguation via population

Hat tip to [Chris Albon](https://github.com/chrisalbon) for the name.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/somnathrakshit/geograpy3",
    "name": "geograpy3",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Somnath Rakshit",
    "author_email": "somnath52@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/67/81/4f3cf76cdf4118aaaf594f2d1f34c220ecacf87574a7d42a4a690caefbed/geograpy3-0.2.7.tar.gz",
    "platform": null,
    "description": "# geograpy3\n[![Join the discussion at https://github.com/somnathrakshit/geograpy3/discussions](https://shields.io/badge/GitHub-%20Discussions-blue?logo=github)](https://github.com/somnathrakshit/geograpy3/discussions)\n[![Documentation Status](https://readthedocs.org/projects/geograpy3/badge/?version=latest)](https://geograpy3.readthedocs.io/en/latest/?badge=latest)\n[![pypi](https://img.shields.io/pypi/pyversions/geograpy3)](https://pypi.org/project/geograpy3/)\n[![Github Actions Build](https://github.com/somnathrakshit/geograpy3/workflows/Build/badge.svg?branch=master)](https://github.com/somnathrakshit/geograpy3/actions?query=workflow%3ABuild+branch%3Amaster)\n[![PyPI Status](https://img.shields.io/pypi/v/geograpy3.svg)](https://pypi.python.org/pypi/geograpy3/)\n[![Downloads](https://pepy.tech/badge/geograpy3)](https://pepy.tech/project/geograpy3)\n[![GitHub issues](https://img.shields.io/github/issues/somnathrakshit/geograpy3.svg)](https://github.com/somnathrakshit/geograpy3/issues)\n[![GitHub closed issues](https://img.shields.io/github/issues-closed/somnathrakshit/geograpy3.svg)](https://github.com/somnathrakshit/geograpy3/issues/?q=is%3Aissue+is%3Aclosed)\n[![License](https://img.shields.io/github/license/somnathrakshit/geograpy3.svg)](https://www.apache.org/licenses/LICENSE-2.0)\n\ngeograpy3 is a fork of [geograpy2](https://github.com/Corollarium/geograpy2), which is itself a fork of [geograpy](https://github.com/ushahidi/geograpy) and inherits most of it, but solves several problems (such as support for utf8, places names\nwith multiple words, confusion over homonyms etc). Also, geograpy3 is compatible with Python 3, unlike geograpy2.\n\nsince geograpy3 0.0.2 cities,countries and regions are matched against a database derived from the corresponding wikidata entries\n\nWhat it is\n==========\n\ngeograpy extracts place names from a URL or text, and adds context to those names -- for example distinguishing between a country, region or city.\n\nThe extraction is a two step process. The first process is a Natural Language Processing task which analyzes a text for potential mentions of geographic locations. In the next step the words which represent such locations are looked up using the Locator.\n\nIf you already know that your content has geographic information you might want to use the Locator interface directly.\n\n## Examples/Tutorial\n* [see Examples/Tutorial Wiki](http://wiki.bitplan.com/index.php/Geograpy#Examples)\n\n## Install & Setup\n\nGrab the package using `pip` (this will take a few minutes)\n```bash\npip install geograpy3\n```\n\ngeograpy3 uses [NLTK](http://www.nltk.org/) for entity recognition, so you'll also need\nto download the models we're using. Fortunately there's a command that'll take\ncare of this for you.\n```bash\ngeograpy-nltk\n```\n\n## Getting the source code\n```bash\ngit clone https://github.com/somnathrakshit/geograpy3\ncd geograpy3\nscripts/install\n```\n\n## Basic Usage\n\nImport the module, give some text or a URL, and presto.\n```python\nimport geograpy\nurl = 'https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay'\nplaces = geograpy.get_geoPlace_context(url=url)\n```\n\nNow you have access to information about all the places mentioned in the linked\narticle.\n\n* `places.countries` _contains a list of country names_\n* `places.regions` _contains a list of region names_\n* `places.cities` _contains a list of city names_\n* `places.other` _lists everything that wasn't clearly a country, region or city_\n\nNote that the `other` list might be useful for shorter texts, to pull out\ninformation like street names, points of interest, etc, but at the moment is\na bit messy when scanning longer texts that contain possessive forms of proper\nnouns (like \"Russian\" instead of \"Russia\").\n\n## But Wait, There's More\n\nIn addition to listing the names of discovered places, you'll also get some\ninformation about the relationships between places.\n\n* `places.country_regions` _regions broken down by country_\n* `places.country_cities` _cities broken down by country_\n* `places.address_strings` _city, region, country strings useful for geocoding_\n\n## Last But Not Least\n\nWhile a text might mention many places, it's probably focused on one or two, so\ngeograpy3 also breaks down countries, regions and cities by number of mentions.\n\n* `places.country_mentions`\n* `places.region_mentions`\n* `places.city_mentions`\n\nEach of these returns a list of tuples. The first item in the tuple is the place\nname and the second item is the number of mentions. For example:\n\n    [('Russian Federation', 14), (u'Ukraine', 11), (u'Lithuania', 1)]  \n\n## If You're Really Serious\n\nYou can of course use each of Geograpy's modules on their own. For example:\n```python\nfrom geograpy import extraction\n\ne = extraction.Extractor(url='https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay')\ne.find_geoEntities()\n\n# You can now access all of the places found by the Extractor\nprint(e.places)\n```\n\nPlace context is handled in the `places` module. For example:\n\n```python\nfrom geograpy import places\n\npc = places.PlaceContext(['Cleveland', 'Ohio', 'United States'])\n\npc.set_countries()\nprint pc.countries #['United States']\n\npc.set_regions()\nprint(pc.regions #['Ohio'])\n\npc.set_cities()\nprint(pc.cities #['Cleveland'])\n\nprint(pc.address_strings #['Cleveland, Ohio, United States'])\n```\n\nAnd of course all of the other information shown above (`country_regions` etc)\nis available after the corresponding `set_` method is called.\n\n## Stackoverflow\n* [Questions tagged with 'geograpy'](https://stackoverflow.com/questions/tagged/geograpy)\n\n## Credits\n\ngeograpy3 uses the following excellent libraries:\n\n* [NLTK](http://www.nltk.org/) for entity recognition\n* [newspaper](https://github.com/codelucas/newspaper) for text extraction from HTML\n* [jellyfish](https://github.com/sunlightlabs/jellyfish) for fuzzy text match\n* [pylodstorage](https://pypi.org/project/pylodstorage/) for storage and retrieval of tabular data from SQL and SPARQL sources\n\ngeograpy3 uses the following data sources:\n* [ISO3166ErrorDictionary](https://github.com/bodacea/countryname/blob/master/countryname/databases/ISO3166ErrorDictionary.csv) for common country mispellings _via [Sara-Jayne Terp](https://github.com/bodacea)_\n* [Wikidata](https://www.wikidata.org) for country/region/city information with disambiguation via population\n\nHat tip to [Chris Albon](https://github.com/chrisalbon) for the name.\n",
    "bugtrack_url": null,
    "license": "Apache",
    "summary": "Extract countries, regions and cities from a URL or text",
    "version": "0.2.7",
    "project_urls": {
        "Code": "https://github.com/somnathrakshit/geograpy3",
        "Documentation": "https://geograpy3.readthedocs.io",
        "Download": "https://github.com/somnathrakshit/geograpy3",
        "Homepage": "https://github.com/somnathrakshit/geograpy3",
        "Issue tracker": "https://github.com/somnathrakshit/geograpy3/issues"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "bae1a6f68aa31163c2572e2b24f63783d1c10aada78388ae8bc798b679cb1539",
                "md5": "e8f2631023f627934ec9bb50a1e12497",
                "sha256": "a290a5e95e3320b49abd5fa4810bfb890924e40cf30846073cb9e8492b46f3bb"
            },
            "downloads": -1,
            "filename": "geograpy3-0.2.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "e8f2631023f627934ec9bb50a1e12497",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 35434,
            "upload_time": "2023-09-29T08:23:45",
            "upload_time_iso_8601": "2023-09-29T08:23:45.199833Z",
            "url": "https://files.pythonhosted.org/packages/ba/e1/a6f68aa31163c2572e2b24f63783d1c10aada78388ae8bc798b679cb1539/geograpy3-0.2.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "67814f3cf76cdf4118aaaf594f2d1f34c220ecacf87574a7d42a4a690caefbed",
                "md5": "7c583fcec5415d7dae89505d36989933",
                "sha256": "9fed9fd8e1cf3757d5d696cdf2670241e61dc03731b4de7c2c053b98b0c5954c"
            },
            "downloads": -1,
            "filename": "geograpy3-0.2.7.tar.gz",
            "has_sig": false,
            "md5_digest": "7c583fcec5415d7dae89505d36989933",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 51651,
            "upload_time": "2023-09-29T08:23:46",
            "upload_time_iso_8601": "2023-09-29T08:23:46.749545Z",
            "url": "https://files.pythonhosted.org/packages/67/81/4f3cf76cdf4118aaaf594f2d1f34c220ecacf87574a7d42a4a690caefbed/geograpy3-0.2.7.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-09-29 08:23:46",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "somnathrakshit",
    "github_project": "geograpy3",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "newspaper3k",
            "specs": [
                [
                    ">=",
                    "0.2.8"
                ]
            ]
        },
        {
            "name": "nltk",
            "specs": [
                [
                    ">=",
                    "3.7"
                ]
            ]
        },
        {
            "name": "jellyfish",
            "specs": [
                [
                    ">=",
                    "0.9.0"
                ]
            ]
        },
        {
            "name": "numpy",
            "specs": [
                [
                    ">=",
                    "1.21.4"
                ]
            ]
        },
        {
            "name": "pylodstorage",
            "specs": [
                [
                    ">=",
                    "0.4.7"
                ]
            ]
        },
        {
            "name": "sphinx-rtd-theme",
            "specs": []
        },
        {
            "name": "scikit-learn",
            "specs": [
                [
                    ">=",
                    "1.0.2"
                ]
            ]
        },
        {
            "name": "pandas",
            "specs": [
                [
                    ">=",
                    "1.3.5"
                ]
            ]
        },
        {
            "name": "geopy",
            "specs": []
        },
        {
            "name": "OSMPythonTools",
            "specs": [
                [
                    ">=",
                    "0.3.3"
                ]
            ]
        }
    ],
    "lcname": "geograpy3"
}
        
Elapsed time: 0.12722s