price-parser


Nameprice-parser JSON
Version 0.3.4 PyPI version JSON
download
home_pagehttps://github.com/scrapinghub/price-parser
SummaryExtract price and currency from a raw string
upload_time2020-11-25 09:35:50
maintainer
docs_urlNone
authorMikhail Korobov
requires_python
license
keywords
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage
            ============
price-parser
============

.. image:: https://img.shields.io/pypi/v/price-parser.svg
   :target: https://pypi.python.org/pypi/price-parser
   :alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/price-parser.svg
   :target: https://pypi.python.org/pypi/price-parser
   :alt: Supported Python Versions

.. image:: https://travis-ci.org/scrapinghub/price-parser.svg?branch=master
   :target: https://travis-ci.org/scrapinghub/price-parser
   :alt: Build Status

.. image:: https://codecov.io/github/scrapinghub/price-parser/coverage.svg?branch=master
   :target: https://codecov.io/gh/scrapinghub/price-parser
   :alt: Coverage report


``price-parser`` is a small library for extracting price and currency from
raw text strings.

Features:

* robust price amount and currency symbol extraction
* zero-effort handling of thousand and decimal separators

The main use case is parsing prices extracted from web pages.
For example, you can write a CSS/XPath selector which targets an element
with a price, and then use this library for cleaning it up,
instead of writing custom site-specific regex or Python code.

License is BSD 3-clause.

Installation
============

::

    pip install price-parser

price-parser requires Python 3.6+.

Usage
=====

Basic usage
-----------

>>> from price_parser import Price
>>> price = Price.fromstring("22,90 €")
>>> price
Price(amount=Decimal('22.90'), currency='€')
>>> price.amount       # numeric price amount
Decimal('22.90')
>>> price.currency     # currency symbol, as appears in the string
'€'
>>> price.amount_text  # price amount, as appears in the string
'22,90'
>>> price.amount_float # price amount as float, not Decimal
22.9

If you prefer, ``Price.fromstring`` has an alias ``price_parser.parse_price``,
they do the same:

>>> from price_parser import parse_price
>>> parse_price("22,90 €")
Price(amount=Decimal('22.90'), currency='€')

The library has extensive tests (900+ real-world examples of price strings).
Some of the supported cases are described below.

Supported cases
---------------

Unclean price strings with various currencies are supported;
thousand separators and decimal separators are handled:

>>> Price.fromstring("Price: $119.00")
Price(amount=Decimal('119.00'), currency='$')

>>> Price.fromstring("15 130 Р")
Price(amount=Decimal('15130'), currency='Р')

>>> Price.fromstring("151,200 تومان")
Price(amount=Decimal('151200'), currency='تومان')

>>> Price.fromstring("Rp 1.550.000")
Price(amount=Decimal('1550000'), currency='Rp')

>>> Price.fromstring("Běžná cena 75 990,00 Kč")
Price(amount=Decimal('75990.00'), currency='Kč')


Euro sign is used as a decimal separator in a wild:

>>> Price.fromstring("1,235€ 99")
Price(amount=Decimal('1235.99'), currency='€')

>>> Price.fromstring("99 € 95 €")
Price(amount=Decimal('99'), currency='€')

>>> Price.fromstring("35€ 999")
Price(amount=Decimal('35'), currency='€')


Some special cases are handled:

>>> Price.fromstring("Free")
Price(amount=Decimal('0'), currency=None)


When price or currency can't be extracted, corresponding
attribute values are set to None:

>>> Price.fromstring("")
Price(amount=None, currency=None)

>>> Price.fromstring("Foo")
Price(amount=None, currency=None)

>>> Price.fromstring("50% OFF")
Price(amount=None, currency=None)

>>> Price.fromstring("50")
Price(amount=Decimal('50'), currency=None)

>>> Price.fromstring("R$")
Price(amount=None, currency='R$')


Currency hints
--------------

``currency_hint`` argument allows to pass a text string which may (or may not)
contain currency information. This feature is most useful for automated price
extraction.

>>> Price.fromstring("34.99", currency_hint="руб. (шт)")
Price(amount=Decimal('34.99'), currency='руб.')

Note that currency mentioned in the main price string may be
**preferred** over currency specified in ``currency_hint`` argument;
it depends on currency symbols found there. If you know the correct currency,
you can set it directly:

>>> price = Price.fromstring("1 000")
>>> price.currency = 'EUR'
>>> price
Price(amount=Decimal('1000'), currency='EUR')


Decimal separator
-----------------

If you know which symbol is used as a decimal separator in the input string,
pass that symbol in the ``decimal_separator`` argument to prevent price-parser
from guessing the wrong decimal separator symbol.

>>> Price.fromstring("Price: $140.600", decimal_separator=".")
Price(amount=Decimal('140.600'), currency='$')

>>> Price.fromstring("Price: $140.600", decimal_separator=",")
Price(amount=Decimal('140600'), currency='$')


Contributing
============

* Source code: https://github.com/scrapinghub/price-parser
* Issue tracker: https://github.com/scrapinghub/price-parser/issues

Use tox_ to run tests with different Python versions::

    tox

The command above also runs type checks; we use mypy.

.. _tox: https://tox.readthedocs.io



Changes
=======

0.3.4 (2020-11-25)
------------------

* Improved parsing of prices without digits before a decimal point ('.75'),
  https://github.com/scrapinghub/price-parser/pull/42
* Fix parsing of price with non-breaking spaces
  https://github.com/scrapinghub/price-parser/pull/43

0.3.3 (2020-02-05)
------------------

* Fixed installation issue on some Windows machines.

0.3.2 (2020-01-28)
------------------

* Improved Korean and Japanese currency detection.
* Declare Python 3.8 support.

0.3.1 (2019-10-21)
------------------

* Redundant $ signs are no longer returned as a part of currency, e.g.
  for ``SGD$ 100`` currency would be ``SGD``, not ``SGD$``.

0.3.0 (2019-10-19)
------------------

* New ``Price.fromstring`` argument ``decimal_separator`` allows to override
  decimal separator for the cases where it is known
  (i.e. disable decimal separator detection);
* NTD and RBM unofficial currency names are added;
* quantifiers in regular expressions are made non-greedy, which provides
  a small speedup;
* test improvements.

0.2.4 (2019-07-03)
------------------

* Declare price-parser as providing type annotations (pep-561). This enables
  better type checking for projects using price-parser.
* improved test coverage

0.2.3 (2019-06-18)
------------------

* Follow-up for 0.2.2 release: improved parsing of prices with 4+ digits
  after a decimal separator.

0.2.2 (2019-06-18)
------------------

* Fixed parsing of prices with 4+ digits after a decimal separator.

0.2.1 (2019-04-19)
------------------

* 23 additional currency symbols are added;
* ``A$`` alias for Australian Dollar is added.

0.2 (2019-04-12)
----------------

Added support for currencies replaced by euro.

0.1.1 (2019-04-12)
------------------

Minor packaging fixes.

0.1 (2019-04-12)
----------------

Initial release.



            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/scrapinghub/price-parser",
    "name": "price-parser",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Mikhail Korobov",
    "author_email": "kmike84@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/16/05/f88c4ff27b944de34d96e57e723f9b44ed9c3d53e0c881f403000487093f/price-parser-0.3.4.tar.gz",
    "platform": "",
    "description": "============\nprice-parser\n============\n\n.. image:: https://img.shields.io/pypi/v/price-parser.svg\n   :target: https://pypi.python.org/pypi/price-parser\n   :alt: PyPI Version\n\n.. image:: https://img.shields.io/pypi/pyversions/price-parser.svg\n   :target: https://pypi.python.org/pypi/price-parser\n   :alt: Supported Python Versions\n\n.. image:: https://travis-ci.org/scrapinghub/price-parser.svg?branch=master\n   :target: https://travis-ci.org/scrapinghub/price-parser\n   :alt: Build Status\n\n.. image:: https://codecov.io/github/scrapinghub/price-parser/coverage.svg?branch=master\n   :target: https://codecov.io/gh/scrapinghub/price-parser\n   :alt: Coverage report\n\n\n``price-parser`` is a small library for extracting price and currency from\nraw text strings.\n\nFeatures:\n\n* robust price amount and currency symbol extraction\n* zero-effort handling of thousand and decimal separators\n\nThe main use case is parsing prices extracted from web pages.\nFor example, you can write a CSS/XPath selector which targets an element\nwith a price, and then use this library for cleaning it up,\ninstead of writing custom site-specific regex or Python code.\n\nLicense is BSD 3-clause.\n\nInstallation\n============\n\n::\n\n    pip install price-parser\n\nprice-parser requires Python 3.6+.\n\nUsage\n=====\n\nBasic usage\n-----------\n\n>>> from price_parser import Price\n>>> price = Price.fromstring(\"22,90 \u20ac\")\n>>> price\nPrice(amount=Decimal('22.90'), currency='\u20ac')\n>>> price.amount       # numeric price amount\nDecimal('22.90')\n>>> price.currency     # currency symbol, as appears in the string\n'\u20ac'\n>>> price.amount_text  # price amount, as appears in the string\n'22,90'\n>>> price.amount_float # price amount as float, not Decimal\n22.9\n\nIf you prefer, ``Price.fromstring`` has an alias ``price_parser.parse_price``,\nthey do the same:\n\n>>> from price_parser import parse_price\n>>> parse_price(\"22,90 \u20ac\")\nPrice(amount=Decimal('22.90'), currency='\u20ac')\n\nThe library has extensive tests (900+ real-world examples of price strings).\nSome of the supported cases are described below.\n\nSupported cases\n---------------\n\nUnclean price strings with various currencies are supported;\nthousand separators and decimal separators are handled:\n\n>>> Price.fromstring(\"Price: $119.00\")\nPrice(amount=Decimal('119.00'), currency='$')\n\n>>> Price.fromstring(\"15 130 \u0420\")\nPrice(amount=Decimal('15130'), currency='\u0420')\n\n>>> Price.fromstring(\"151,200 \u062a\u0648\u0645\u0627\u0646\")\nPrice(amount=Decimal('151200'), currency='\u062a\u0648\u0645\u0627\u0646')\n\n>>> Price.fromstring(\"Rp 1.550.000\")\nPrice(amount=Decimal('1550000'), currency='Rp')\n\n>>> Price.fromstring(\"B\u011b\u017en\u00e1 cena 75 990,00 K\u010d\")\nPrice(amount=Decimal('75990.00'), currency='K\u010d')\n\n\nEuro sign is used as a decimal separator in a wild:\n\n>>> Price.fromstring(\"1,235\u20ac 99\")\nPrice(amount=Decimal('1235.99'), currency='\u20ac')\n\n>>> Price.fromstring(\"99 \u20ac 95 \u20ac\")\nPrice(amount=Decimal('99'), currency='\u20ac')\n\n>>> Price.fromstring(\"35\u20ac 999\")\nPrice(amount=Decimal('35'), currency='\u20ac')\n\n\nSome special cases are handled:\n\n>>> Price.fromstring(\"Free\")\nPrice(amount=Decimal('0'), currency=None)\n\n\nWhen price or currency can't be extracted, corresponding\nattribute values are set to None:\n\n>>> Price.fromstring(\"\")\nPrice(amount=None, currency=None)\n\n>>> Price.fromstring(\"Foo\")\nPrice(amount=None, currency=None)\n\n>>> Price.fromstring(\"50% OFF\")\nPrice(amount=None, currency=None)\n\n>>> Price.fromstring(\"50\")\nPrice(amount=Decimal('50'), currency=None)\n\n>>> Price.fromstring(\"R$\")\nPrice(amount=None, currency='R$')\n\n\nCurrency hints\n--------------\n\n``currency_hint`` argument allows to pass a text string which may (or may not)\ncontain currency information. This feature is most useful for automated price\nextraction.\n\n>>> Price.fromstring(\"34.99\", currency_hint=\"\u0440\u0443\u0431. (\u0448\u0442)\")\nPrice(amount=Decimal('34.99'), currency='\u0440\u0443\u0431.')\n\nNote that currency mentioned in the main price string may be\n**preferred** over currency specified in ``currency_hint`` argument;\nit depends on currency symbols found there. If you know the correct currency,\nyou can set it directly:\n\n>>> price = Price.fromstring(\"1 000\")\n>>> price.currency = 'EUR'\n>>> price\nPrice(amount=Decimal('1000'), currency='EUR')\n\n\nDecimal separator\n-----------------\n\nIf you know which symbol is used as a decimal separator in the input string,\npass that symbol in the ``decimal_separator`` argument to prevent price-parser\nfrom guessing the wrong decimal separator symbol.\n\n>>> Price.fromstring(\"Price: $140.600\", decimal_separator=\".\")\nPrice(amount=Decimal('140.600'), currency='$')\n\n>>> Price.fromstring(\"Price: $140.600\", decimal_separator=\",\")\nPrice(amount=Decimal('140600'), currency='$')\n\n\nContributing\n============\n\n* Source code: https://github.com/scrapinghub/price-parser\n* Issue tracker: https://github.com/scrapinghub/price-parser/issues\n\nUse tox_ to run tests with different Python versions::\n\n    tox\n\nThe command above also runs type checks; we use mypy.\n\n.. _tox: https://tox.readthedocs.io\n\n\n\nChanges\n=======\n\n0.3.4 (2020-11-25)\n------------------\n\n* Improved parsing of prices without digits before a decimal point ('.75'),\n  https://github.com/scrapinghub/price-parser/pull/42\n* Fix parsing of price with non-breaking spaces\n  https://github.com/scrapinghub/price-parser/pull/43\n\n0.3.3 (2020-02-05)\n------------------\n\n* Fixed installation issue on some Windows machines.\n\n0.3.2 (2020-01-28)\n------------------\n\n* Improved Korean and Japanese currency detection.\n* Declare Python 3.8 support.\n\n0.3.1 (2019-10-21)\n------------------\n\n* Redundant $ signs are no longer returned as a part of currency, e.g.\n  for ``SGD$ 100`` currency would be ``SGD``, not ``SGD$``.\n\n0.3.0 (2019-10-19)\n------------------\n\n* New ``Price.fromstring`` argument ``decimal_separator`` allows to override\n  decimal separator for the cases where it is known\n  (i.e. disable decimal separator detection);\n* NTD and RBM unofficial currency names are added;\n* quantifiers in regular expressions are made non-greedy, which provides\n  a small speedup;\n* test improvements.\n\n0.2.4 (2019-07-03)\n------------------\n\n* Declare price-parser as providing type annotations (pep-561). This enables\n  better type checking for projects using price-parser.\n* improved test coverage\n\n0.2.3 (2019-06-18)\n------------------\n\n* Follow-up for 0.2.2 release: improved parsing of prices with 4+ digits\n  after a decimal separator.\n\n0.2.2 (2019-06-18)\n------------------\n\n* Fixed parsing of prices with 4+ digits after a decimal separator.\n\n0.2.1 (2019-04-19)\n------------------\n\n* 23 additional currency symbols are added;\n* ``A$`` alias for Australian Dollar is added.\n\n0.2 (2019-04-12)\n----------------\n\nAdded support for currencies replaced by euro.\n\n0.1.1 (2019-04-12)\n------------------\n\nMinor packaging fixes.\n\n0.1 (2019-04-12)\n----------------\n\nInitial release.\n\n\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Extract price and currency from a raw string",
    "version": "0.3.4",
    "project_urls": {
        "Homepage": "https://github.com/scrapinghub/price-parser"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6a3efd9f039dc9da081b96cf266949590619ddaf919caf67fef3cd46e907f5d5",
                "md5": "9cb219487cd8efa0baf087c3ae142321",
                "sha256": "99c89bd5b3e40ae0826dd1bec8392a82edab49d05b5add80d545c82fc8bd1633"
            },
            "downloads": -1,
            "filename": "price_parser-0.3.4-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "9cb219487cd8efa0baf087c3ae142321",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": null,
            "size": 15230,
            "upload_time": "2020-11-25T09:35:49",
            "upload_time_iso_8601": "2020-11-25T09:35:49.809489Z",
            "url": "https://files.pythonhosted.org/packages/6a/3e/fd9f039dc9da081b96cf266949590619ddaf919caf67fef3cd46e907f5d5/price_parser-0.3.4-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1605f88c4ff27b944de34d96e57e723f9b44ed9c3d53e0c881f403000487093f",
                "md5": "17bce6c8905a25f05828b148316c5bf8",
                "sha256": "8a286caa40fe28912f5e596bf09d72e085f80c4b2acaf5dd22aab8298e6ba731"
            },
            "downloads": -1,
            "filename": "price-parser-0.3.4.tar.gz",
            "has_sig": false,
            "md5_digest": "17bce6c8905a25f05828b148316c5bf8",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 32589,
            "upload_time": "2020-11-25T09:35:50",
            "upload_time_iso_8601": "2020-11-25T09:35:50.870328Z",
            "url": "https://files.pythonhosted.org/packages/16/05/f88c4ff27b944de34d96e57e723f9b44ed9c3d53e0c881f403000487093f/price-parser-0.3.4.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2020-11-25 09:35:50",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "scrapinghub",
    "github_project": "price-parser",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "tox": true,
    "lcname": "price-parser"
}
        
Elapsed time: 0.37532s