YURL

Name	YURL JSON
Version	1.0.0 JSON
	download
home_page	http://github.com/homm/yurl/
Summary	Yurl is alternative url manipulation library
upload_time	2019-04-18 16:08:57
maintainer
docs_url	None
author	Aleksadr Karpinsky
requires_python
license
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI
coveralls test coverage	No coveralls.

            ====================================
Alternative url manipulation library
====================================

Yurl is the replacement of built in python urlparse module.
Key features of yurl are:

* pythonic api
* better compliance with RFC 3986
* nice performance
* support for python 2.6, 2.7, 3.2, 3.3 and pypy 1.9 with single codebase

Yurl inspired by purl — pythonic interface to urlparse.

===
API
===


Parsing
-------

To parse url into parts, pass string as first argument to URL() constructor:

    >>> from yurl import URL
    >>> URL('https://www.google.ru/search?q=yurl')
    URLBase(scheme='https', userinfo=u'', host='www.google.ru', port='',
     path='/search', query='q=yurl', fragment='', decoded=False)

It also works with relative urls:

    >>> URL('search?rls=en&q=yurl&redir_esc=')
    URLBase(scheme=u'', userinfo=u'', host=u'', port='', path='search',
     query='rls=en&q=yurl&redir_esc=', fragment='', decoded=False)

Url also can be constructed from known parts:

    >>> print URL(host='google.com', path='search', query='q=url')
    //google.com/search?q=url


Validation
----------

Url parsing is always successful, even if some parts have unescaped or
not allowed chars. After parsing you can call validate() method:

    >>> URL('//google:com').validate()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "yurl.py", line 201, in validate
        raise InvalidHost()
    yurl.InvalidHost

Validate() returns object itself or modified version:

    >>> URL('//google.com:80').validate()
    URLBase(scheme=u'', userinfo=u'', host='google.com', port='80',
     path='', query='', fragment='', decoded=False)


Get information
---------------

URL() returns named tuple with some additional properties. All properties
is strings, even if they does not exists in url.

.scheme .authority .path .query .fragment
    Basic parts of url: *scheme://authority/path?query#fragment*

.userinfo .host .port
    Parts of authority: *userinfo@host:port*
    Port is guaranteed to consist of digits.

.full_path
    Path, query and fragment joined together: *path?query#fragment*

.username .authorization
    Parts of userinfo: *username:authorization*

Url object has method for checking authority existence:

    >>> URL('http://google.com:80').has_authority()
    True

Also you can check is url relative:

    >>> URL('http://google.com:80').is_relative()
    False
    >>> URL('//google.com:80').is_relative()
    True

Or have relative path:

    >>> URL('scheme:path').is_relative_path()
    False
    >>> URL('./path').is_relative_path()
    True

You can also chech is url host is ip:

    >>> URL('//127-0-0-1/').is_host_ip()
    False
    >>> URL('//127.0.0.1/').is_host_ip()
    True
    >>> URL('//[::ae21:ad12]/').is_host_ip()
    True
    >>> URL('//[::ae21:ad12]/').is_host_ipv4()
    False

Ip does not validated, so it is recommended to use validate() method:

    >>> URL('//[+ae21:ad12]/').is_host_ip()
    True
    >>> URL('//[+ae21:ad12]/').validate().is_host_ip()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "yurl.py", line 197, in validate
        raise InvalidHost()
    yurl.InvalidHost


Modify urls
-----------

After parsing url can be modified in different ways.

replace() method
~~~~~~~~~~~~~~~~

You can use replace() method to change whole parts of url:

    >>> print URL('http://ya.ru/').replace(scheme='https')
    https://ya.ru/
    >>> print URL('http://ya.ru/?q=yurl').replace(query='')
    http://ya.ru/

In addition to the usual attributes it takes shortcuts authority and full_path:

    >>> print URL('http://user@ya.ru:80/?q=yurl')\
    ... .replace(authority='google.com', full_path='two')
    http://google.com/two

setdefault() method
~~~~~~~~~~~~~~~~~~~

setdefault() replace parts with given if they don't exists in original url:

    >>> print URL('https://google.com').setdefault(scheme='http', path='q')
    https://google.com/q

Url join
~~~~~~~~

Join is analogue of urljoin() function from urlparse module. You can join two
urls by adding one to another.

    >>> print URL('http://ya.ru/path#chap2') + URL('seqrch?q=some')
    http://ya.ru/seqrch?q=some

Join for relative urls is also supported:

    >>> print URL('path/to/object#chap2') + URL('../from/object')
    path/from/object

Join is not commutative operation:

    >>> print URL('../from/object') + URL('path/to/object#chap2')
    from/path/to/object#chap2

And not associative in general:

    >>> print (URL('//google/path/to') + URL('../../object')) + URL('path')
    //google/path
    >>> print URL('//google/path/to') + (URL('../../object') + URL('path'))
    //google/path/path


Decode url
----------

All chars in url is divided to three groups: delimeters, subdelimeters and
unreserved chars. Unreserved chars do not affect the parsing and can be encoded
or decoded at any time. To decode unreserved chars you can call decode()
method. Defaul encoding is utf-8.

    >>> url = '%D1%81%D1%85%D0%B5%D0%BC%D0%B0%3A%D0%BF%D1%83%D1%82%D1%8C'
    >>> print URL(url).decode()
    схема%3Aпуть

If you want decode all chars, you should apply decode_url_component()
function to url component:

    >>> from yurl import decode_url_component
    >>> print decode_url_component(URL(url).decode().path)
    схема:путь

You can also omit decode method if you pass encoding in decode_url_component():

    >>> print decode_url_component(url, 'utf-8')
    схема:путь

If you do not pass encoding, only reserved chars will be decoded:

    >>> print decode_url_component(url)
    %D1%81%D1%85%D0%B5%D0%BC%D0%B0:%D0%BF%D1%83%D1%82%D1%8C

Cache url parsing
-----------------

Original urlparse() cache every parsed url. In most cases this is unnecessary.
But if you parse the same link again and again you can use CachedURL:

    >>> CachedURL('http://host') is CachedURL('http://host')
    True

=============
About library
=============


Decisions
---------

Rfc define format of valid url and ways to interact with it. But sometimes we
need to interact invalid urls. And RFC's not much help with it. So this library
has lots of decisions.

*   Many libraries do not allow scheme or authority with invalid chars. Rfc
    unambiguously define format of this parts. So we can say 'sche_me:path'
    can not be scheme because of underscore and should be parsed as path:

    >>> urlsplit('sche_me:path')[:]
    ('', '', 'sche_me:path', '', '')

    The problem is rfc also defines that the first segment of the path can not
    contain colon. I believe the right way is to split url as is and then
    validate if necessary.

    >>> urlsplit('sche_me:path')[:]
    ('sche_me', '', 'path', '', '')

*   Rfc define two operations against url: parse and join. As long as we can
    construct url from parts and replace parts we should sometimes fix
    this parts. For example url with authority can not be relative.
    And relative url can not starts with // or contain : in first path segment.
    These fixes can be done while url constructing or while recomposition.
    First way may be wrong because we can apply unnecessary in future fix:

    >>> # This is example of wrong behavior.
    >>> print URL("//host") + URL(path="//path")
    //host////path  # now path have four slashes

    Second way is wrong when we replace some parts:

    >>> # This is example of wrong behavior.
    >>> print URL("rel/path").replace(host='host').path
    rel/path  # path is relative even if host there

    So I divide all fixes to real fixes:

    >>> # path can not be relative when host present
    >>> print URL("rel/path").replace(host='host').path
    /rel/path

    And escapes which should be applied on recomposition:

    >>> # url starts with path can not contain ':' in first path segment
    >>> print URL(path="rel:path")
    ./rel:path
    >>> print URL(path="rel:path").path
    rel:path


Why you might want to use yurl instead of urlparse
--------------------------------------------------

The short answer is urlparse is broken. If you're interested, here's detailed
response.

*   urlparse module have two functions: urlparse() and urlsplit(). In addition to
    urlsplit(), urlparse() separates params from path. Params is not part of
    most schemas and in last rfc is not part of url at all. Instead of this
    each path segment can have own params. The problem is that most programmers
    use urlparse() and ignore params when extract path:

    >>> import purl
    >>> print purl.URL('/path;with?semicolon')
    /path?semicolon

*   urlsplit() has strange parameters. It takes default addressing scheme.
    But scheme is only can have default value in urlsplit().

*   Another parameter allow_fragments can be used to prevent splitting
    #fragment from path. The problem is that we can't say «I do not want
    fragment in this url». If url contatin '#', it contatin frаgment. If scheme
    can not contatin fragment, '#' still can not be used in another parts.
    Caller has a choise: he can ignore fragment or raise. But url can not be
    parsed with ignoring '#':

    >>> urlparse('/path#frag:ment?query').query
    ''
    >>> urlparse('/path#frag:ment?query', allow_fragments=False).query
    'query'

*   Module makes no difference between parsing and validating. For example
    urlsplit() check allowed chars in scheme and raise on invalid IP URL:

    >>> urlsplit('not_scheme://google.com').path
    'not_scheme://google.com'
    >>> urlsplit('//ho[st/')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/urlparse.py", line 211, in urlsplit
        raise ValueError("Invalid IPv6 URL")
    ValueError: Invalid IPv6 URL

    But ignores other errors:

    >>> urlsplit('//host@with@butterflies').username
    'host@with'
    >>> urlsplit('//butterflies[]:80').port
    80

*   It don't understend my favorite scheme:

    >>> urlsplit('lucky-number:33')[:]
    ('', '', 'lucky-number:33', '', '')

*   It loses path with two slashes:

    >>> urlsplit('////path')[:]
    ('', '', '//path', '', '')
    >>> urlsplit(urlsplit('////path').geturl())[:]
    ('', 'path', '', '', '')

*   Function urljoin() broken sometimes:

    >>> urljoin('http://host/', '../')
    'http://host/../'
    >>> print URL('http://host/') + URL('../')
    http://host

I'm sure the list is not complete.


Why you might want to use yurl instead of purl
----------------------------------------------

Purl built on top of urlparse() and include almost all problems listed above.
And some other:

*   Purl parsing is about 2 times slower then urlparse(), while yurl parsing
    is about 2 times faster then urlparse().

*   Purl manipulations is about 20 times slower then yurl:

    >>> timeit("url.scheme('https')", "import purl; url = purl.URL('http://google.com/')", number=10000)
    0.4427049160003662
    >>> timeit("url.replace(scheme='https')", "import yurl; url = yurl.URL('http://google.com/')", number=10000)
    0.020306110382080078

*   Purl have ugly jquery-like api, when one method may return different
    objects depending on the arguments.

*   Purl parsing is dangerous:

    >>> purl.URL('//@host')
    ValueError: need more than 1 value to unpack
    >>> purl.URL('//host:/')
    ValueError: invalid literal for int() with base 10: ''
    >>> purl.URL('//user:pass:word@host')
    ValueError: too many values to unpack

*   Purl loses path after ';'. While ';' is valid char in url:

    >>> print purl.URL('/path;with?semicolon')
    /path?semicolon

*   Purl loses host in relative urls:

    >>> print purl.URL('//google.com/path?query')
    google.com/path?query

*   Purl loses username with empty password and password with empty username:

    >>> print purl.URL('http://user:@google.com/')
    http://google.com/


More about performance
-----------------------

Yurl comes with bunch of performance tests. Results may vary depending on the
Python version and the CPU:

::

    $ python2.7 ./test.py -bench

    === Test as string ===
      yurl usplit uparse   purl
     12.01  9.783  11.94  27.08 !worse  https://user:info@yandex.ru:8080/path/to+the=ar?gum=ent#s
     8.533  21.89  23.82  18.88   scheme:8080/path/to;the=ar?gum=ent#s
     10.12  3.879  9.007  12.21 !worse  re/ative:path;with?query
     5.268   2.39  4.043  10.26 !worse  lucky-number:3456
     4.806  3.662  5.349  13.73 !worse  //host:80
     4.953  3.342  4.885   13.2 !worse  #frag

    === Manipulations speed ===
      noop   yurl
    0.0751  178.9   https://habrahabr.ru:80/a/b/c?d=f#h

    === Test join ===

      = result is string =
      yurl  ujoin
     111.6  127.2   u'http://ya.ru/user/photos/id12324/photo3' + u'../../../mikhail/photos/id6543/photo99?param'
     85.87  71.06 !worse  u'http://ya.ru/user/photos/id12324' + u'#fragment'
     82.12  100.8   u'http://ya.ru/' + u'https://google.com/?q=yurl'

      = result is parsed =
      yurl  ujoin
     102.6  181.3   u'http://ya.ru/user/photos/id12324/photo3' + u'../../../mikhail/photos/id6543/photo99?param'
     73.15  125.7   u'http://ya.ru/user/photos/id12324' + u'#fragment'
     76.26  184.3   u'http://ya.ru/' + u'https://google.com/?q=yurl'

    === Test parse ===

      = dupass cache =
      yurl usplit uparse   purl
     36.25  73.31  85.91  166.5   https://user:info@yandex.ru:8080/path/to+the=ar?gum=ent#s
     20.34  58.84  77.29  138.9   scheme:8080/path/to;the=ar?gum=ent#s
     18.25  33.21  48.72  109.3   re/ative:path;with?query
      19.3  66.77  76.16  135.5   lucky-number:3456
      24.0  35.57  43.36  119.2   //host:80
      18.0  25.57  37.78  114.4   #frag

      = with cache =
      yurl usplit uparse   purl
     9.902  14.43  24.04  95.92   https://user:info@yandex.ru:8080/path/to+the=ar?gum=ent#s
     5.726  7.211  23.14  79.94   scheme:8080/path/to;the=ar?gum=ent#s
     5.497  6.804  22.86  80.93   re/ative:path;with?query
     5.357  6.521  14.72   72.0   lucky-number:3456
     5.076  6.763  14.12  87.39   //host:80
     5.824  7.993  26.78  73.03   #frag

In tests where any of the other libraries beats yurl you can see "!worse"
marker.


Changelog
---------

v0.13
~~~~~

* fixed installation on not utf-8 systems

v0.12
~~~~~

* added URLError exception on top of ValueError

v0.11
~~~~~

* decode() method
* username and authorization properties
* order of tuple members now same as url parts:
  scheme, userinfo, host, port, path, query, fragment
* raw url parsing was moved to split_url() function of utils module

v0.10
~~~~~

* method replace_from() removed
* concatenation with string no longer aliasd with join
* join always remove dots segments (as defined in rfc)

v0.9
~~~~

First release.

Raw data

            {
    "_id": null,
    "home_page": "http://github.com/homm/yurl/",
    "name": "YURL",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Aleksadr Karpinsky",
    "author_email": "homm86@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2c/c5/98f7359c9f53a9b122f0764b5a3a677495830f635ad9e50fb63534e1c908/YURL-1.0.0.tar.gz",
    "platform": "",
    "description": "====================================\nAlternative url manipulation library\n====================================\n\nYurl is the replacement of built in python urlparse module.\nKey features of yurl are:\n\n* pythonic api\n* better compliance with RFC 3986\n* nice performance\n* support for python 2.6, 2.7, 3.2, 3.3 and pypy 1.9 with single codebase\n\nYurl inspired by purl \u2014 pythonic interface to urlparse.\n\n===\nAPI\n===\n\n\nParsing\n-------\n\nTo parse url into parts, pass string as first argument to URL() constructor:\n\n    >>> from yurl import URL\n    >>> URL('https://www.google.ru/search?q=yurl')\n    URLBase(scheme='https', userinfo=u'', host='www.google.ru', port='',\n     path='/search', query='q=yurl', fragment='', decoded=False)\n\nIt also works with relative urls:\n\n    >>> URL('search?rls=en&q=yurl&redir_esc=')\n    URLBase(scheme=u'', userinfo=u'', host=u'', port='', path='search',\n     query='rls=en&q=yurl&redir_esc=', fragment='', decoded=False)\n\nUrl also can be constructed from known parts:\n\n    >>> print URL(host='google.com', path='search', query='q=url')\n    //google.com/search?q=url\n\n\nValidation\n----------\n\nUrl parsing is always successful, even if some parts have unescaped or\nnot allowed chars. After parsing you can call validate() method:\n\n    >>> URL('//google:com').validate()\n    Traceback (most recent call last):\n      File \"<stdin>\", line 1, in <module>\n      File \"yurl.py\", line 201, in validate\n        raise InvalidHost()\n    yurl.InvalidHost\n\nValidate() returns object itself or modified version:\n\n    >>> URL('//google.com:80').validate()\n    URLBase(scheme=u'', userinfo=u'', host='google.com', port='80',\n     path='', query='', fragment='', decoded=False)\n\n\nGet information\n---------------\n\nURL() returns named tuple with some additional properties. All properties\nis strings, even if they does not exists in url.\n\n.scheme .authority .path .query .fragment\n    Basic parts of url: *scheme://authority/path?query#fragment*\n\n.userinfo .host .port\n    Parts of authority: *userinfo@host:port*\n    Port is guaranteed to consist of digits.\n\n.full_path\n    Path, query and fragment joined together: *path?query#fragment*\n\n.username .authorization\n    Parts of userinfo: *username:authorization*\n\nUrl object has method for checking authority existence:\n\n    >>> URL('http://google.com:80').has_authority()\n    True\n\nAlso you can check is url relative:\n\n    >>> URL('http://google.com:80').is_relative()\n    False\n    >>> URL('//google.com:80').is_relative()\n    True\n\nOr have relative path:\n\n    >>> URL('scheme:path').is_relative_path()\n    False\n    >>> URL('./path').is_relative_path()\n    True\n\nYou can also chech is url host is ip:\n\n    >>> URL('//127-0-0-1/').is_host_ip()\n    False\n    >>> URL('//127.0.0.1/').is_host_ip()\n    True\n    >>> URL('//[::ae21:ad12]/').is_host_ip()\n    True\n    >>> URL('//[::ae21:ad12]/').is_host_ipv4()\n    False\n\nIp does not validated, so it is recommended to use validate() method:\n\n    >>> URL('//[+ae21:ad12]/').is_host_ip()\n    True\n    >>> URL('//[+ae21:ad12]/').validate().is_host_ip()\n    Traceback (most recent call last):\n      File \"<stdin>\", line 1, in <module>\n      File \"yurl.py\", line 197, in validate\n        raise InvalidHost()\n    yurl.InvalidHost\n\n\nModify urls\n-----------\n\nAfter parsing url can be modified in different ways.\n\nreplace() method\n~~~~~~~~~~~~~~~~\n\nYou can use replace() method to change whole parts of url:\n\n    >>> print URL('http://ya.ru/').replace(scheme='https')\n    https://ya.ru/\n    >>> print URL('http://ya.ru/?q=yurl').replace(query='')\n    http://ya.ru/\n\nIn addition to the usual attributes it takes shortcuts authority and full_path:\n\n    >>> print URL('http://user@ya.ru:80/?q=yurl')\\\n    ... .replace(authority='google.com', full_path='two')\n    http://google.com/two\n\nsetdefault() method\n~~~~~~~~~~~~~~~~~~~\n\nsetdefault() replace parts with given if they don't exists in original url:\n\n    >>> print URL('https://google.com').setdefault(scheme='http', path='q')\n    https://google.com/q\n\nUrl join\n~~~~~~~~\n\nJoin is analogue of urljoin() function from urlparse module. You can join two\nurls by adding one to another.\n\n    >>> print URL('http://ya.ru/path#chap2') + URL('seqrch?q=some')\n    http://ya.ru/seqrch?q=some\n\nJoin for relative urls is also supported:\n\n    >>> print URL('path/to/object#chap2') + URL('../from/object')\n    path/from/object\n\nJoin is not commutative operation:\n\n    >>> print URL('../from/object') + URL('path/to/object#chap2')\n    from/path/to/object#chap2\n\nAnd not associative in general:\n\n    >>> print (URL('//google/path/to') + URL('../../object')) + URL('path')\n    //google/path\n    >>> print URL('//google/path/to') + (URL('../../object') + URL('path'))\n    //google/path/path\n\n\nDecode url\n----------\n\nAll chars in url is divided to three groups: delimeters, subdelimeters and\nunreserved chars. Unreserved chars do not affect the parsing and can be encoded\nor decoded at any time. To decode unreserved chars you can call decode()\nmethod. Defaul encoding is utf-8.\n\n    >>> url = '%D1%81%D1%85%D0%B5%D0%BC%D0%B0%3A%D0%BF%D1%83%D1%82%D1%8C'\n    >>> print URL(url).decode()\n    \u0441\u0445\u0435\u043c\u0430%3A\u043f\u0443\u0442\u044c\n\nIf you want decode all chars, you should apply decode_url_component()\nfunction to url component:\n\n    >>> from yurl import decode_url_component\n    >>> print decode_url_component(URL(url).decode().path)\n    \u0441\u0445\u0435\u043c\u0430:\u043f\u0443\u0442\u044c\n\nYou can also omit decode method if you pass encoding in decode_url_component():\n\n    >>> print decode_url_component(url, 'utf-8')\n    \u0441\u0445\u0435\u043c\u0430:\u043f\u0443\u0442\u044c\n\nIf you do not pass encoding, only reserved chars will be decoded:\n\n    >>> print decode_url_component(url)\n    %D1%81%D1%85%D0%B5%D0%BC%D0%B0:%D0%BF%D1%83%D1%82%D1%8C\n\nCache url parsing\n-----------------\n\nOriginal urlparse() cache every parsed url. In most cases this is unnecessary.\nBut if you parse the same link again and again you can use CachedURL:\n\n    >>> CachedURL('http://host') is CachedURL('http://host')\n    True\n\n=============\nAbout library\n=============\n\n\nDecisions\n---------\n\nRfc define format of valid url and ways to interact with it. But sometimes we\nneed to interact invalid urls. And RFC's not much help with it. So this library\nhas lots of decisions.\n\n*   Many libraries do not allow scheme or authority with invalid chars. Rfc\n    unambiguously define format of this parts. So we can say 'sche_me:path'\n    can not be scheme because of underscore and should be parsed as path:\n\n    >>> urlsplit('sche_me:path')[:]\n    ('', '', 'sche_me:path', '', '')\n\n    The problem is rfc also defines that the first segment of the path can not\n    contain colon. I believe the right way is to split url as is and then\n    validate if necessary.\n\n    >>> urlsplit('sche_me:path')[:]\n    ('sche_me', '', 'path', '', '')\n\n*   Rfc define two operations against url: parse and join. As long as we can\n    construct url from parts and replace parts we should sometimes fix\n    this parts. For example url with authority can not be relative.\n    And relative url can not starts with // or contain : in first path segment.\n    These fixes can be done while url constructing or while recomposition.\n    First way may be wrong because we can apply unnecessary in future fix:\n\n    >>> # This is example of wrong behavior.\n    >>> print URL(\"//host\") + URL(path=\"//path\")\n    //host////path  # now path have four slashes\n\n    Second way is wrong when we replace some parts:\n\n    >>> # This is example of wrong behavior.\n    >>> print URL(\"rel/path\").replace(host='host').path\n    rel/path  # path is relative even if host there\n\n    So I divide all fixes to real fixes:\n\n    >>> # path can not be relative when host present\n    >>> print URL(\"rel/path\").replace(host='host').path\n    /rel/path\n\n    And escapes which should be applied on recomposition:\n\n    >>> # url starts with path can not contain ':' in first path segment\n    >>> print URL(path=\"rel:path\")\n    ./rel:path\n    >>> print URL(path=\"rel:path\").path\n    rel:path\n\n\nWhy you might want to use yurl instead of urlparse\n--------------------------------------------------\n\nThe short answer is urlparse is broken. If you're interested, here's detailed\nresponse.\n\n*   urlparse module have two functions: urlparse() and urlsplit(). In addition to\n    urlsplit(), urlparse() separates params from path. Params is not part of\n    most schemas and in last rfc is not part of url at all. Instead of this\n    each path segment can have own params. The problem is that most programmers\n    use urlparse() and ignore params when extract path:\n\n    >>> import purl\n    >>> print purl.URL('/path;with?semicolon')\n    /path?semicolon\n\n*   urlsplit() has strange parameters. It takes default addressing scheme.\n    But scheme is only can have default value in urlsplit().\n\n*   Another parameter allow_fragments can be used to prevent splitting\n    #fragment from path. The problem is that we can't say \u00abI do not want\n    fragment in this url\u00bb. If url contatin '#', it contatin fr\u0430gment. If scheme\n    can not contatin fragment, '#' still can not be used in another parts.\n    Caller has a choise: he can ignore fragment or raise. But url can not be\n    parsed with ignoring '#':\n\n    >>> urlparse('/path#frag:ment?query').query\n    ''\n    >>> urlparse('/path#frag:ment?query', allow_fragments=False).query\n    'query'\n\n*   Module makes no difference between parsing and validating. For example\n    urlsplit() check allowed chars in scheme and raise on invalid IP URL:\n\n    >>> urlsplit('not_scheme://google.com').path\n    'not_scheme://google.com'\n    >>> urlsplit('//ho[st/')\n    Traceback (most recent call last):\n      File \"<stdin>\", line 1, in <module>\n      File \"/usr/lib/python2.7/urlparse.py\", line 211, in urlsplit\n        raise ValueError(\"Invalid IPv6 URL\")\n    ValueError: Invalid IPv6 URL\n\n    But ignores other errors:\n\n    >>> urlsplit('//host@with@butterflies').username\n    'host@with'\n    >>> urlsplit('//butterflies[]:80').port\n    80\n\n*   It don't understend my favorite scheme:\n\n    >>> urlsplit('lucky-number:33')[:]\n    ('', '', 'lucky-number:33', '', '')\n\n*   It loses path with two slashes:\n\n    >>> urlsplit('////path')[:]\n    ('', '', '//path', '', '')\n    >>> urlsplit(urlsplit('////path').geturl())[:]\n    ('', 'path', '', '', '')\n\n*   Function urljoin() broken sometimes:\n\n    >>> urljoin('http://host/', '../')\n    'http://host/../'\n    >>> print URL('http://host/') + URL('../')\n    http://host\n\nI'm sure the list is not complete.\n\n\nWhy you might want to use yurl instead of purl\n----------------------------------------------\n\nPurl built on top of urlparse() and include almost all problems listed above.\nAnd some other:\n\n*   Purl parsing is about 2 times slower then urlparse(), while yurl parsing\n    is about 2 times faster then urlparse().\n\n*   Purl manipulations is about 20 times slower then yurl:\n\n    >>> timeit(\"url.scheme('https')\", \"import purl; url = purl.URL('http://google.com/')\", number=10000)\n    0.4427049160003662\n    >>> timeit(\"url.replace(scheme='https')\", \"import yurl; url = yurl.URL('http://google.com/')\", number=10000)\n    0.020306110382080078\n\n*   Purl have ugly jquery-like api, when one method may return different\n    objects depending on the arguments.\n\n*   Purl parsing is dangerous:\n\n    >>> purl.URL('//@host')\n    ValueError: need more than 1 value to unpack\n    >>> purl.URL('//host:/')\n    ValueError: invalid literal for int() with base 10: ''\n    >>> purl.URL('//user:pass:word@host')\n    ValueError: too many values to unpack\n\n*   Purl loses path after ';'. While ';' is valid char in url:\n\n    >>> print purl.URL('/path;with?semicolon')\n    /path?semicolon\n\n*   Purl loses host in relative urls:\n\n    >>> print purl.URL('//google.com/path?query')\n    google.com/path?query\n\n*   Purl loses username with empty password and password with empty username:\n\n    >>> print purl.URL('http://user:@google.com/')\n    http://google.com/\n\n\nMore about performance\n-----------------------\n\nYurl comes with bunch of performance tests. Results may vary depending on the\nPython version and the CPU:\n\n::\n\n    $ python2.7 ./test.py -bench\n\n    === Test as string ===\n      yurl usplit uparse   purl\n     12.01  9.783  11.94  27.08 !worse  https://user:info@yandex.ru:8080/path/to+the=ar?gum=ent#s\n     8.533  21.89  23.82  18.88   scheme:8080/path/to;the=ar?gum=ent#s\n     10.12  3.879  9.007  12.21 !worse  re/ative:path;with?query\n     5.268   2.39  4.043  10.26 !worse  lucky-number:3456\n     4.806  3.662  5.349  13.73 !worse  //host:80\n     4.953  3.342  4.885   13.2 !worse  #frag\n\n    === Manipulations speed ===\n      noop   yurl\n    0.0751  178.9   https://habrahabr.ru:80/a/b/c?d=f#h\n\n    === Test join ===\n\n      = result is string =\n      yurl  ujoin\n     111.6  127.2   u'http://ya.ru/user/photos/id12324/photo3' + u'../../../mikhail/photos/id6543/photo99?param'\n     85.87  71.06 !worse  u'http://ya.ru/user/photos/id12324' + u'#fragment'\n     82.12  100.8   u'http://ya.ru/' + u'https://google.com/?q=yurl'\n\n      = result is parsed =\n      yurl  ujoin\n     102.6  181.3   u'http://ya.ru/user/photos/id12324/photo3' + u'../../../mikhail/photos/id6543/photo99?param'\n     73.15  125.7   u'http://ya.ru/user/photos/id12324' + u'#fragment'\n     76.26  184.3   u'http://ya.ru/' + u'https://google.com/?q=yurl'\n\n    === Test parse ===\n\n      = dupass cache =\n      yurl usplit uparse   purl\n     36.25  73.31  85.91  166.5   https://user:info@yandex.ru:8080/path/to+the=ar?gum=ent#s\n     20.34  58.84  77.29  138.9   scheme:8080/path/to;the=ar?gum=ent#s\n     18.25  33.21  48.72  109.3   re/ative:path;with?query\n      19.3  66.77  76.16  135.5   lucky-number:3456\n      24.0  35.57  43.36  119.2   //host:80\n      18.0  25.57  37.78  114.4   #frag\n\n      = with cache =\n      yurl usplit uparse   purl\n     9.902  14.43  24.04  95.92   https://user:info@yandex.ru:8080/path/to+the=ar?gum=ent#s\n     5.726  7.211  23.14  79.94   scheme:8080/path/to;the=ar?gum=ent#s\n     5.497  6.804  22.86  80.93   re/ative:path;with?query\n     5.357  6.521  14.72   72.0   lucky-number:3456\n     5.076  6.763  14.12  87.39   //host:80\n     5.824  7.993  26.78  73.03   #frag\n\nIn tests where any of the other libraries beats yurl you can see \"!worse\"\nmarker.\n\n\nChangelog\n---------\n\nv0.13\n~~~~~\n\n* fixed installation on not utf-8 systems\n\nv0.12\n~~~~~\n\n* added URLError exception on top of ValueError\n\nv0.11\n~~~~~\n\n* decode() method\n* username and authorization properties\n* order of tuple members now same as url parts:\n  scheme, userinfo, host, port, path, query, fragment\n* raw url parsing was moved to split_url() function of utils module\n\nv0.10\n~~~~~\n\n* method replace_from() removed\n* concatenation with string no longer aliasd with join\n* join always remove dots segments (as defined in rfc)\n\nv0.9\n~~~~\n\nFirst release.\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Yurl is alternative url manipulation library",
    "version": "1.0.0",
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "md5": "e0f0e7e9e0e632740e5697d0c8922389",
                "sha256": "1b9497efc0b4f85af9e5d139fb3e93fca825bccf2c62d463aca489630a248619"
            },
            "downloads": -1,
            "filename": "YURL-1.0.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e0f0e7e9e0e632740e5697d0c8922389",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 15233,
            "upload_time": "2019-04-18T16:08:57",
            "upload_time_iso_8601": "2019-04-18T16:08:57.702842Z",
            "url": "https://files.pythonhosted.org/packages/2c/c5/98f7359c9f53a9b122f0764b5a3a677495830f635ad9e50fb63534e1c908/YURL-1.0.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2019-04-18 16:08:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "github_user": "homm",
    "github_project": "yurl",
    "travis_ci": true,
    "coveralls": false,
    "github_actions": false,
    "lcname": "yurl"
}

Aleksadr Karpinsky