cleanurl


Namecleanurl JSON
Version 0.1.15 PyPI version JSON
download
home_pagehttps://github.com/xojoc/cleanurl
SummaryRemove clutter from URLs and return a canonicalized version
upload_time2023-03-21 21:32:03
maintainer
docs_urlNone
authorAlexandru Cojocaru
requires_python>=3.9,<4.0
licenseAGPL-3.0-or-later
keywords url canonical
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # cleanurl
Remove clutter from URLs and return a canonicalized version

# Install
```
pip install cleanurl
```
or if you're using poetry:
```
poetry add cleanurl
```

# Usage
By default *cleanurl* retuns a cleaned URL without respecting semantics.
For example:

```
>>> import cleanurl
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/focus.html?utm_content=buffercf3b2&utm_medium=social&utm_source=snapchat.com&utm_campaign=buffe')
>>> r.url
'https://xojoc.pw/blog/focus'
>>> r.parsed_url
ParseResult(scheme='https', netloc='xojoc.pw', path='/blog/focus', params='', query='', fragment='')
```

The default parameters are useful if you want to get a *canonical* URL without caring if the resulting URL is still valid.

If you want to get a clean URL which is still valid call it like this:

```
>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/////focus.html', respect_semantics=True)
>>> r.url
'https://www.xojoc.pw/blog/focus.html'
```

```celeanurl.cleanurl``` parameters:

 - ```generic``` -> if True don't use site specific rules
 - ```respect_semantics``` -> if True make sure the returned URL is still valid, altough it may still contain some superfluous elements
 - ```host_remap``` -> whether to remap hosts. Example:
```
>>> import cleanurl
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=True).url
'https://twitter.com/i/status/1453753924960219145'
>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=False).url
'https://threadreaderapp.com/thread/1453753924960219145'
```

For more examples see the [unit tests](https://github.com/xojoc/cleanurl/blob/main/src/test_cleanurl.py).


# Why?
While there are some libraries that handle general cases, this library has website specific rules that more aggresivly normalize urls.

# Users
Initially used for [discu.eu](https://discu.eu).

[Discussions around the web](https://discu.eu/q/https://github.com/xojoc/cleanurl)

# Who?
*cleanurl* was written by [Alexandru Cojocaru](https://xojoc.pw).

# License
*cleanurl* is [Free Software](https://www.gnu.org/philosophy/free-sw.html) and is released as [AGPLv3](https://github.com/xojoc/cleanurl/blob/main/LICENSE)
            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/xojoc/cleanurl",
    "name": "cleanurl",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.9,<4.0",
    "maintainer_email": "",
    "keywords": "url,canonical",
    "author": "Alexandru Cojocaru",
    "author_email": "hi@xojoc.pw",
    "download_url": "https://files.pythonhosted.org/packages/92/fb/bf71e2b1060f36fb26f1b62f26f8a9d27c13a95b9a86310118f963071619/cleanurl-0.1.15.tar.gz",
    "platform": null,
    "description": "# cleanurl\nRemove clutter from URLs and return a canonicalized version\n\n# Install\n```\npip install cleanurl\n```\nor if you're using poetry:\n```\npoetry add cleanurl\n```\n\n# Usage\nBy default *cleanurl* retuns a cleaned URL without respecting semantics.\nFor example:\n\n```\n>>> import cleanurl\n>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/focus.html?utm_content=buffercf3b2&utm_medium=social&utm_source=snapchat.com&utm_campaign=buffe')\n>>> r.url\n'https://xojoc.pw/blog/focus'\n>>> r.parsed_url\nParseResult(scheme='https', netloc='xojoc.pw', path='/blog/focus', params='', query='', fragment='')\n```\n\nThe default parameters are useful if you want to get a *canonical* URL without caring if the resulting URL is still valid.\n\nIf you want to get a clean URL which is still valid call it like this:\n\n```\n>>> r = cleanurl.cleanurl('https://www.xojoc.pw/blog/////focus.html', respect_semantics=True)\n>>> r.url\n'https://www.xojoc.pw/blog/focus.html'\n```\n\n```celeanurl.cleanurl``` parameters:\n\n - ```generic``` -> if True don't use site specific rules\n - ```respect_semantics``` -> if True make sure the returned URL is still valid, altough it may still contain some superfluous elements\n - ```host_remap``` -> whether to remap hosts. Example:\n```\n>>> import cleanurl\n>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=True).url\n'https://twitter.com/i/status/1453753924960219145'\n>>> cleanurl.cleanurl('https://threadreaderapp.com/thread/1453753924960219145', host_remap=False).url\n'https://threadreaderapp.com/thread/1453753924960219145'\n```\n\nFor more examples see the [unit tests](https://github.com/xojoc/cleanurl/blob/main/src/test_cleanurl.py).\n\n\n# Why?\nWhile there are some libraries that handle general cases, this library has website specific rules that more aggresivly normalize urls.\n\n# Users\nInitially used for [discu.eu](https://discu.eu).\n\n[Discussions around the web](https://discu.eu/q/https://github.com/xojoc/cleanurl)\n\n# Who?\n*cleanurl* was written by [Alexandru Cojocaru](https://xojoc.pw).\n\n# License\n*cleanurl* is [Free Software](https://www.gnu.org/philosophy/free-sw.html) and is released as [AGPLv3](https://github.com/xojoc/cleanurl/blob/main/LICENSE)",
    "bugtrack_url": null,
    "license": "AGPL-3.0-or-later",
    "summary": "Remove clutter from URLs and return a canonicalized version",
    "version": "0.1.15",
    "split_keywords": [
        "url",
        "canonical"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "e6d932b98ad854a35cde655f462d0d0fc55ae052188eb54c7c835dfb8dd0b35e",
                "md5": "bbb78e4c47d93892e1252e7af9e817d2",
                "sha256": "24edd6f8d4d01b8781c709b122e0f0d55defa081535ef416f7f04aaedf9bde7a"
            },
            "downloads": -1,
            "filename": "cleanurl-0.1.15-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "bbb78e4c47d93892e1252e7af9e817d2",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9,<4.0",
            "size": 18637,
            "upload_time": "2023-03-21T21:32:01",
            "upload_time_iso_8601": "2023-03-21T21:32:01.198359Z",
            "url": "https://files.pythonhosted.org/packages/e6/d9/32b98ad854a35cde655f462d0d0fc55ae052188eb54c7c835dfb8dd0b35e/cleanurl-0.1.15-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92fbbf71e2b1060f36fb26f1b62f26f8a9d27c13a95b9a86310118f963071619",
                "md5": "dedb6c91e75b7d7c9e4279b620e385fe",
                "sha256": "e05e9fe59491a5df51dd4a08015d82259cdd1c2fe2f6b573205d8ec09877bbaa"
            },
            "downloads": -1,
            "filename": "cleanurl-0.1.15.tar.gz",
            "has_sig": false,
            "md5_digest": "dedb6c91e75b7d7c9e4279b620e385fe",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9,<4.0",
            "size": 18287,
            "upload_time": "2023-03-21T21:32:03",
            "upload_time_iso_8601": "2023-03-21T21:32:03.225420Z",
            "url": "https://files.pythonhosted.org/packages/92/fb/bf71e2b1060f36fb26f1b62f26f8a9d27c13a95b9a86310118f963071619/cleanurl-0.1.15.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-21 21:32:03",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "xojoc",
    "github_project": "cleanurl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "cleanurl"
}
        
Elapsed time: 0.05445s