urlparse4


Nameurlparse4 JSON
Version 0.1.3 PyPI version JSON
download
home_pagehttps://github.com/commonsearch/urlparse4
SummaryPerformance-focused replacement for Python's urlparse module
upload_time2016-07-10 23:48:37
maintainerNone
docs_urlNone
authorCommon Search contributors
requires_pythonNone
licenseApache License, Version 2.0
keywords urlparse urlsplit urljoin url parser urlparser parsing gurl cython faster speed performance
VCS
bugtrack_url
requirements tabulate Cython pytest uritools YURL urlparse2 urlparse3 slimurl gurl-cython
Travis-CI No Travis.
coveralls test coverage
            urlparse4
=========

``urlparse4`` is a performance-focused replacement for Python's
``urlparse`` module, using C++ code from Chromium's own URL parser.

It is not production-ready yet.

Many credits go to
`gurl-cython <https://github.com/Preetwinder/gurl-cython>`__ for
inspiration.

Differences with Python's ``urlparse``
--------------------------------------

``urlparse4`` should be a transparent, drop-in replacement in almost all
cases. Still, there are a few differences to be aware of:

-  ``urlparse4`` is 2-7x faster for most operations (see benchmarks
   below)
-  ``urlparse4`` currently doesn't pass CPython's ``test_urlparse.py``
   suite due to edge cases that Chromium's parser manages differently
   (usually in accordance to the RFCs, which ``urlparse`` doesn't follow
   entirely).
-  ``urlparse4`` only supports Python 2.7 for now

How to test
-----------

You must have Docker installed and running. You can run CPython's test
suite for ``urlparse`` like this:

::

    make docker_build
    make docker_test

Benchmarks
----------

We are testing the following librairies on a sample of 100k URLs from
Blink and DMOZ:

-  urlparse4 ;-)
-  `CPython's
   urlparse <https://github.com/python/cpython/blob/2.7/Lib/urlparse.py>`__
-  `urlparse2 <https://github.com/mwhooker/urlparse2>`__
-  `YURL <http://github.com/homm/yurl/>`__
-  `uritools <https://github.com/tkem/uritools>`__
-  `pygurl / gurl-cython <https://github.com/Preetwinder/gurl-cython>`__
-  `cyuri <https://github.com/mitghi/cyuri>`__

Each of them is being tested on a few different types of operations
(basic urlsplit, relative link resolution, hostname extraction)

Here is how to launch the tests:

::

    make docker_build
    make docker_benchmark

Current results on a 2.2GHz Intel Core i7 MBP (in seconds):

::

    Benchmark results on 104300 URLs x 10 times, in seconds:

    Name              Sum            Mean               Median             90%
    ----------------  -------------  -----------------  -----------------  -----------------

    urlsplit:
    ----              ----           ----               ----               ----
    urlparse4         1.681858       1.61251965484e-06  1.99999999984e-06  2.00000000006e-06
    pygurl            2.031712       1.94795014382e-06  1.99999999984e-06  2.00000000028e-06
    uritools          2.638991       2.53019271333e-06  2.00000000028e-06  3.00000000042e-06
    yurl              3.910247       3.74903835091e-06  3.00000000131e-06  4.99999999981e-06
    urlparse2         3.756782       3.60190028763e-06  2.99999999953e-06  4.00000000056e-06
    urlparse          3.862006       3.70278619367e-06  3.00000000308e-06  4.99999999803e-06
    cyuri             9.912275       9.50361936721e-06  8.00000000112e-06  1.30000000027e-05

    urljoin_sibling:
    ----              ----           ----               ----               ----
    urlparse4         2.008453       1.92565004794e-06  2.00000000206e-06  2.00000000206e-06
    pygurl            2.193427       2.10299808245e-06  2.00000000206e-06  2.99999999953e-06
    uritools          10.575344      1.01393518696e-05  9.99999999607e-06  1.20000000052e-05
    yurl              13.213052      1.26683144775e-05  1.19999999981e-05  1.60000000022e-05
    urlparse2         14.239327      1.36522790029e-05  1.19999999981e-05  1.69999999997e-05
    urlparse          9.25991500001  8.87815436242e-06  8.00000000822e-06  1.10000000006e-05
    cyuri             5.742724       5.50596740172e-06  5.00000000159e-06  7.00000001075e-06

    hostname:
    ----              ----           ----               ----               ----
    urlparse4         1.883982       1.80631064237e-06  1.99999999495e-06  2.00000000916e-06
    pygurl            1.67332099999  1.60433461169e-06  1.99999999495e-06  2.00000000916e-06
    uritools          3.31632199999  3.17959923297e-06  3.00000000664e-06  4.00000000411e-06
    yurl              3.853319       3.69445733461e-06  3.00000000664e-06  4.00000000411e-06
    urlparse2         4.641513       4.45015627996e-06  4.00000000411e-06  5.99999999906e-06
    urlparse          5.122682       4.91148801534e-06  4.00000000411e-06  5.99999999906e-06
    cyuri             11.108649      1.06506701822e-05  9.0000000057e-06   1.5999999988e-05

Some libraries are included in the benchmark code but disabled for
various reasons:

-  `urlparse3 <https://pypi.python.org/pypi/urlparse3/>`__ (Raises on
   valid URLs)
-  `slimurl <https://github.com/mosquito/slimurl>`__ (Too slow)

Feel free to submit pull requests to add new ones!

Feedback
--------

We'd love to hear your feedback! Feel free to look at the issues on
GitHub and open new ones if needed :)
            

Raw data

            {
    "maintainer": null, 
    "docs_url": null, 
    "requires_python": null, 
    "maintainer_email": null, 
    "cheesecake_code_kwalitee_id": null, 
    "keywords": "urlparse,urlsplit,urljoin,url,parser,urlparser,parsing,gurl,cython,faster,speed,performance", 
    "upload_time": "2016-07-10 23:48:37", 
    "requirements": [
        {
            "name": "tabulate", 
            "specs": [
                [
                    "==", 
                    "0.7.5"
                ]
            ]
        }, 
        {
            "name": "Cython", 
            "specs": [
                [
                    "==", 
                    "0.24"
                ]
            ]
        }, 
        {
            "name": "pytest", 
            "specs": [
                [
                    "==", 
                    "2.9.2"
                ]
            ]
        }, 
        {
            "name": "uritools", 
            "specs": [
                [
                    "==", 
                    "1.0.2"
                ]
            ]
        }, 
        {
            "name": "YURL", 
            "specs": [
                [
                    "==", 
                    "0.13"
                ]
            ]
        }, 
        {
            "name": "urlparse2", 
            "specs": [
                [
                    "==", 
                    "1.1.1"
                ]
            ]
        }, 
        {
            "name": "urlparse3", 
            "specs": [
                [
                    "==", 
                    "1.0.9"
                ]
            ]
        }, 
        {
            "name": "slimurl", 
            "specs": [
                [
                    "==", 
                    "0.7.2"
                ]
            ]
        }, 
        {
            "name": "gurl-cython", 
            "specs": []
        }
    ], 
    "author": "Common Search contributors", 
    "home_page": "https://github.com/commonsearch/urlparse4", 
    "github_user": "commonsearch", 
    "download_url": "https://pypi.python.org/packages/af/6f/a2d1a397b47ce3af6c5bb8936a7a8f930bf29b4df42081da842c5c84c1d1/urlparse4-0.1.3.tar.gz", 
    "platform": "any", 
    "version": "0.1.3", 
    "cheesecake_documentation_id": null, 
    "description": "urlparse4\n=========\n\n``urlparse4`` is a performance-focused replacement for Python's\n``urlparse`` module, using C++ code from Chromium's own URL parser.\n\nIt is not production-ready yet.\n\nMany credits go to\n`gurl-cython <https://github.com/Preetwinder/gurl-cython>`__ for\ninspiration.\n\nDifferences with Python's ``urlparse``\n--------------------------------------\n\n``urlparse4`` should be a transparent, drop-in replacement in almost all\ncases. Still, there are a few differences to be aware of:\n\n-  ``urlparse4`` is 2-7x faster for most operations (see benchmarks\n   below)\n-  ``urlparse4`` currently doesn't pass CPython's ``test_urlparse.py``\n   suite due to edge cases that Chromium's parser manages differently\n   (usually in accordance to the RFCs, which ``urlparse`` doesn't follow\n   entirely).\n-  ``urlparse4`` only supports Python 2.7 for now\n\nHow to test\n-----------\n\nYou must have Docker installed and running. You can run CPython's test\nsuite for ``urlparse`` like this:\n\n::\n\n    make docker_build\n    make docker_test\n\nBenchmarks\n----------\n\nWe are testing the following librairies on a sample of 100k URLs from\nBlink and DMOZ:\n\n-  urlparse4 ;-)\n-  `CPython's\n   urlparse <https://github.com/python/cpython/blob/2.7/Lib/urlparse.py>`__\n-  `urlparse2 <https://github.com/mwhooker/urlparse2>`__\n-  `YURL <http://github.com/homm/yurl/>`__\n-  `uritools <https://github.com/tkem/uritools>`__\n-  `pygurl / gurl-cython <https://github.com/Preetwinder/gurl-cython>`__\n-  `cyuri <https://github.com/mitghi/cyuri>`__\n\nEach of them is being tested on a few different types of operations\n(basic urlsplit, relative link resolution, hostname extraction)\n\nHere is how to launch the tests:\n\n::\n\n    make docker_build\n    make docker_benchmark\n\nCurrent results on a 2.2GHz Intel Core i7 MBP (in seconds):\n\n::\n\n    Benchmark results on 104300 URLs x 10 times, in seconds:\n\n    Name              Sum            Mean               Median             90%\n    ----------------  -------------  -----------------  -----------------  -----------------\n\n    urlsplit:\n    ----              ----           ----               ----               ----\n    urlparse4         1.681858       1.61251965484e-06  1.99999999984e-06  2.00000000006e-06\n    pygurl            2.031712       1.94795014382e-06  1.99999999984e-06  2.00000000028e-06\n    uritools          2.638991       2.53019271333e-06  2.00000000028e-06  3.00000000042e-06\n    yurl              3.910247       3.74903835091e-06  3.00000000131e-06  4.99999999981e-06\n    urlparse2         3.756782       3.60190028763e-06  2.99999999953e-06  4.00000000056e-06\n    urlparse          3.862006       3.70278619367e-06  3.00000000308e-06  4.99999999803e-06\n    cyuri             9.912275       9.50361936721e-06  8.00000000112e-06  1.30000000027e-05\n\n    urljoin_sibling:\n    ----              ----           ----               ----               ----\n    urlparse4         2.008453       1.92565004794e-06  2.00000000206e-06  2.00000000206e-06\n    pygurl            2.193427       2.10299808245e-06  2.00000000206e-06  2.99999999953e-06\n    uritools          10.575344      1.01393518696e-05  9.99999999607e-06  1.20000000052e-05\n    yurl              13.213052      1.26683144775e-05  1.19999999981e-05  1.60000000022e-05\n    urlparse2         14.239327      1.36522790029e-05  1.19999999981e-05  1.69999999997e-05\n    urlparse          9.25991500001  8.87815436242e-06  8.00000000822e-06  1.10000000006e-05\n    cyuri             5.742724       5.50596740172e-06  5.00000000159e-06  7.00000001075e-06\n\n    hostname:\n    ----              ----           ----               ----               ----\n    urlparse4         1.883982       1.80631064237e-06  1.99999999495e-06  2.00000000916e-06\n    pygurl            1.67332099999  1.60433461169e-06  1.99999999495e-06  2.00000000916e-06\n    uritools          3.31632199999  3.17959923297e-06  3.00000000664e-06  4.00000000411e-06\n    yurl              3.853319       3.69445733461e-06  3.00000000664e-06  4.00000000411e-06\n    urlparse2         4.641513       4.45015627996e-06  4.00000000411e-06  5.99999999906e-06\n    urlparse          5.122682       4.91148801534e-06  4.00000000411e-06  5.99999999906e-06\n    cyuri             11.108649      1.06506701822e-05  9.0000000057e-06   1.5999999988e-05\n\nSome libraries are included in the benchmark code but disabled for\nvarious reasons:\n\n-  `urlparse3 <https://pypi.python.org/pypi/urlparse3/>`__ (Raises on\n   valid URLs)\n-  `slimurl <https://github.com/mosquito/slimurl>`__ (Too slow)\n\nFeel free to submit pull requests to add new ones!\n\nFeedback\n--------\n\nWe'd love to hear your feedback! Feel free to look at the issues on\nGitHub and open new ones if needed :)", 
    "lcname": "urlparse4", 
    "bugtrack_url": null, 
    "github": true, 
    "coveralls": true, 
    "name": "urlparse4", 
    "license": "Apache License, Version 2.0", 
    "travis_ci": false, 
    "github_project": "urlparse4", 
    "summary": "Performance-focused replacement for Python's urlparse module", 
    "split_keywords": [
        "urlparse", 
        "urlsplit", 
        "urljoin", 
        "url", 
        "parser", 
        "urlparser", 
        "parsing", 
        "gurl", 
        "cython", 
        "faster", 
        "speed", 
        "performance"
    ], 
    "author_email": "contact@commonsearch.org", 
    "urls": [
        {
            "has_sig": false, 
            "upload_time": "2016-07-10T23:48:37", 
            "comment_text": "", 
            "python_version": "source", 
            "url": "https://pypi.python.org/packages/af/6f/a2d1a397b47ce3af6c5bb8936a7a8f930bf29b4df42081da842c5c84c1d1/urlparse4-0.1.3.tar.gz", 
            "md5_digest": "026865e0c0a035f3cee0025f1c0983a7", 
            "downloads": 0, 
            "filename": "urlparse4-0.1.3.tar.gz", 
            "packagetype": "sdist", 
            "path": "af/6f/a2d1a397b47ce3af6c5bb8936a7a8f930bf29b4df42081da842c5c84c1d1/urlparse4-0.1.3.tar.gz", 
            "size": 158431
        }
    ], 
    "_id": null, 
    "cheesecake_installability_id": null
}