lexery


Namelexery JSON
Version 1.2.0 PyPI version JSON
download
home_pagehttps://github.com/Parquery/lexery
SummaryA simple lexer based on regular expressions
upload_time2023-05-23 21:27:43
maintainer
docs_urlNone
authorMarko Ristin
requires_python
licenseLicense :: OSI Approved :: MIT License
keywords lexer regexp regular expression
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            Lexery
======

.. image:: https://github.com/Parquery/lexery/actions/workflows/ci.yml/badge.svg
    :target: https://github.com/Parquery/lexery/actions/workflows/ci.yml
    :alt: Continuous integration

.. image:: https://coveralls.io/repos/github/Parquery/lexery/badge.svg?branch=master
    :target: https://coveralls.io/github/Parquery/lexery?branch=master
    :alt: Coverage

.. image:: https://badge.fury.io/py/lexery.svg
    :target: https://pypi.org/project/lexery/
    :alt: PyPI - version

.. image:: https://img.shields.io/pypi/pyversions/lexery.svg
    :target: https://pypi.org/project/lexery/
    :alt: PyPI - Python Version

A simple lexer based on regular expressions.

Inspired by https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript

Usage
=====
You define the lexing rules and lexery matches them iteratively as a look-up:

.. code-block:: python

    >>> import lexery
    >>> import re
    >>> text = 'crop \t   ( 20, 30, 40, 10 ) ;'
    >>>
    >>> lexer = lexery.Lexer(
    ...     rules=[
    ...         lexery.Rule(identifier='identifier',
    ...             pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
    ...         lexery.Rule(identifier='lpar', pattern=re.compile(r'\(')),
    ...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
    ...         lexery.Rule(identifier='rpar', pattern=re.compile(r'\)')),
    ...         lexery.Rule(identifier='comma', pattern=re.compile(r',')),
    ...         lexery.Rule(identifier='semi', pattern=re.compile(r';'))
    ...     ],
    ...     skip_whitespace=True)
    >>> tokens = lexer.lex(text=text)
    >>> assert tokens == [[
    ...     lexery.Token('identifier', 'crop', 0, 0), 
    ...     lexery.Token('lpar', '(', 9, 0),
    ...     lexery.Token('number', '20', 11, 0),
    ...     lexery.Token('comma', ',', 13, 0),
    ...     lexery.Token('number', '30', 15, 0),
    ...     lexery.Token('comma', ',', 17, 0),
    ...     lexery.Token('number', '40', 19, 0),
    ...     lexery.Token('comma', ',', 21, 0),
    ...     lexery.Token('number', '10', 23, 0),
    ...     lexery.Token('rpar', ')', 26, 0),
    ...     lexery.Token('semi', ';', 28, 0)]]

Mind that if a part of the text can not be matched, a ``lexery.Error`` is raised:

.. code-block:: python

    >>> import lexery
    >>> import re
    >>> text = 'some-identifier ( 23 )'
    >>>
    >>> lexer = lexery.Lexer(
    ...     rules=[
    ...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
    ...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
    ...     ],
    ...     skip_whitespace=True)
    >>> tokens = lexer.lex(text=text)
    Traceback (most recent call last):
    ...
    lexery.Error: Unmatched text at line 0 and position 4:
    some-identifier ( 23 )
        ^

If you specify an ``unmatched_identifier``, all the unmatched characters are accumulated in tokens with that identifier:

.. code-block:: python

    >>> import lexery
    >>> import re
    >>> text = 'some-identifier ( 23 )-'
    >>>
    >>> lexer = lexery.Lexer(
    ...     rules=[
    ...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
    ...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
    ...     ],
    ...     skip_whitespace=True,
    ...     unmatched_identifier='unmatched')
    >>> tokens = lexer.lex(text=text)
    >>> assert tokens == [[
    ...     lexery.Token('identifier', 'some', 0, 0),
    ...    lexery.Token('unmatched', '-', 4, 0),
    ...    lexery.Token('identifier', 'identifier', 5, 0),
    ...    lexery.Token('unmatched', '(', 16, 0),
    ...    lexery.Token('number', '23', 18, 0),
    ...    lexery.Token('unmatched', ')-', 21, 0)]]


Installation
============

* Install lexery with pip:

.. code-block:: bash

    pip3 install lexery

Development
===========

* Check out the repository.

* In the repository root, create the virtual environment:

.. code-block:: bash

    python3 -m venv venv3

* Activate the virtual environment:

.. code-block:: bash

    source venv3/bin/activate

* Install the development dependencies:

.. code-block:: bash

    pip3 install -e .[dev]

Pre-commit Checks
-----------------
We provide a set of pre-commit checks that run unit tests, lint and check code for formatting.

Namely, we use:

* `yapf <https://github.com/google/yapf>`_ to check the formatting.
* The style of the docstrings is checked with `pydocstyle <https://github.com/PyCQA/pydocstyle>`_.
* Static type analysis is performed with `mypy <http://mypy-lang.org/>`_.
* Various linter checks are done with `pylint <https://www.pylint.org/>`_.

Run the pre-commit checks locally from an activated virtual environment with development dependencies:

.. code-block:: bash

    ./precommit.py

* The pre-commit script can also automatically format the code:

.. code-block:: bash

    ./precommit.py  --overwrite


Versioning
==========
We follow `Semantic Versioning <http://semver.org/spec/v1.0.0.html>`_. The version X.Y.Z indicates:

* X is the major version (backward-incompatible),
* Y is the minor version (backward-compatible), and
* Z is the patch version (backward-compatible bug fix).

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Parquery/lexery",
    "name": "lexery",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "lexer regexp regular expression",
    "author": "Marko Ristin",
    "author_email": "marko@ristin.ch",
    "download_url": "https://files.pythonhosted.org/packages/95/94/33271bf8cdd0de9fb8c210e95b472d2978f48f444bce795cf2e08db04b2c/lexery-1.2.0.tar.gz",
    "platform": null,
    "description": "Lexery\n======\n\n.. image:: https://github.com/Parquery/lexery/actions/workflows/ci.yml/badge.svg\n    :target: https://github.com/Parquery/lexery/actions/workflows/ci.yml\n    :alt: Continuous integration\n\n.. image:: https://coveralls.io/repos/github/Parquery/lexery/badge.svg?branch=master\n    :target: https://coveralls.io/github/Parquery/lexery?branch=master\n    :alt: Coverage\n\n.. image:: https://badge.fury.io/py/lexery.svg\n    :target: https://pypi.org/project/lexery/\n    :alt: PyPI - version\n\n.. image:: https://img.shields.io/pypi/pyversions/lexery.svg\n    :target: https://pypi.org/project/lexery/\n    :alt: PyPI - Python Version\n\nA simple lexer based on regular expressions.\n\nInspired by https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript\n\nUsage\n=====\nYou define the lexing rules and lexery matches them iteratively as a look-up:\n\n.. code-block:: python\n\n    >>> import lexery\n    >>> import re\n    >>> text = 'crop \\t   ( 20, 30, 40, 10 ) ;'\n    >>>\n    >>> lexer = lexery.Lexer(\n    ...     rules=[\n    ...         lexery.Rule(identifier='identifier',\n    ...             pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),\n    ...         lexery.Rule(identifier='lpar', pattern=re.compile(r'\\(')),\n    ...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),\n    ...         lexery.Rule(identifier='rpar', pattern=re.compile(r'\\)')),\n    ...         lexery.Rule(identifier='comma', pattern=re.compile(r',')),\n    ...         lexery.Rule(identifier='semi', pattern=re.compile(r';'))\n    ...     ],\n    ...     skip_whitespace=True)\n    >>> tokens = lexer.lex(text=text)\n    >>> assert tokens == [[\n    ...     lexery.Token('identifier', 'crop', 0, 0), \n    ...     lexery.Token('lpar', '(', 9, 0),\n    ...     lexery.Token('number', '20', 11, 0),\n    ...     lexery.Token('comma', ',', 13, 0),\n    ...     lexery.Token('number', '30', 15, 0),\n    ...     lexery.Token('comma', ',', 17, 0),\n    ...     lexery.Token('number', '40', 19, 0),\n    ...     lexery.Token('comma', ',', 21, 0),\n    ...     lexery.Token('number', '10', 23, 0),\n    ...     lexery.Token('rpar', ')', 26, 0),\n    ...     lexery.Token('semi', ';', 28, 0)]]\n\nMind that if a part of the text can not be matched, a ``lexery.Error`` is raised:\n\n.. code-block:: python\n\n    >>> import lexery\n    >>> import re\n    >>> text = 'some-identifier ( 23 )'\n    >>>\n    >>> lexer = lexery.Lexer(\n    ...     rules=[\n    ...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),\n    ...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),\n    ...     ],\n    ...     skip_whitespace=True)\n    >>> tokens = lexer.lex(text=text)\n    Traceback (most recent call last):\n    ...\n    lexery.Error: Unmatched text at line 0 and position 4:\n    some-identifier ( 23 )\n        ^\n\nIf you specify an ``unmatched_identifier``, all the unmatched characters are accumulated in tokens with that identifier:\n\n.. code-block:: python\n\n    >>> import lexery\n    >>> import re\n    >>> text = 'some-identifier ( 23 )-'\n    >>>\n    >>> lexer = lexery.Lexer(\n    ...     rules=[\n    ...         lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),\n    ...         lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),\n    ...     ],\n    ...     skip_whitespace=True,\n    ...     unmatched_identifier='unmatched')\n    >>> tokens = lexer.lex(text=text)\n    >>> assert tokens == [[\n    ...     lexery.Token('identifier', 'some', 0, 0),\n    ...    lexery.Token('unmatched', '-', 4, 0),\n    ...    lexery.Token('identifier', 'identifier', 5, 0),\n    ...    lexery.Token('unmatched', '(', 16, 0),\n    ...    lexery.Token('number', '23', 18, 0),\n    ...    lexery.Token('unmatched', ')-', 21, 0)]]\n\n\nInstallation\n============\n\n* Install lexery with pip:\n\n.. code-block:: bash\n\n    pip3 install lexery\n\nDevelopment\n===========\n\n* Check out the repository.\n\n* In the repository root, create the virtual environment:\n\n.. code-block:: bash\n\n    python3 -m venv venv3\n\n* Activate the virtual environment:\n\n.. code-block:: bash\n\n    source venv3/bin/activate\n\n* Install the development dependencies:\n\n.. code-block:: bash\n\n    pip3 install -e .[dev]\n\nPre-commit Checks\n-----------------\nWe provide a set of pre-commit checks that run unit tests, lint and check code for formatting.\n\nNamely, we use:\n\n* `yapf <https://github.com/google/yapf>`_ to check the formatting.\n* The style of the docstrings is checked with `pydocstyle <https://github.com/PyCQA/pydocstyle>`_.\n* Static type analysis is performed with `mypy <http://mypy-lang.org/>`_.\n* Various linter checks are done with `pylint <https://www.pylint.org/>`_.\n\nRun the pre-commit checks locally from an activated virtual environment with development dependencies:\n\n.. code-block:: bash\n\n    ./precommit.py\n\n* The pre-commit script can also automatically format the code:\n\n.. code-block:: bash\n\n    ./precommit.py  --overwrite\n\n\nVersioning\n==========\nWe follow `Semantic Versioning <http://semver.org/spec/v1.0.0.html>`_. The version X.Y.Z indicates:\n\n* X is the major version (backward-incompatible),\n* Y is the minor version (backward-compatible), and\n* Z is the patch version (backward-compatible bug fix).\n",
    "bugtrack_url": null,
    "license": "License :: OSI Approved :: MIT License",
    "summary": "A simple lexer based on regular expressions",
    "version": "1.2.0",
    "project_urls": {
        "Homepage": "https://github.com/Parquery/lexery"
    },
    "split_keywords": [
        "lexer",
        "regexp",
        "regular",
        "expression"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "959433271bf8cdd0de9fb8c210e95b472d2978f48f444bce795cf2e08db04b2c",
                "md5": "0f10c9ba9f913cd7868118f4766496ae",
                "sha256": "6acc7af2b33c788dffd461004e441c87fc3d5d2cc5b3f8dc031c522201910068"
            },
            "downloads": -1,
            "filename": "lexery-1.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "0f10c9ba9f913cd7868118f4766496ae",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 5254,
            "upload_time": "2023-05-23T21:27:43",
            "upload_time_iso_8601": "2023-05-23T21:27:43.667157Z",
            "url": "https://files.pythonhosted.org/packages/95/94/33271bf8cdd0de9fb8c210e95b472d2978f48f444bce795cf2e08db04b2c/lexery-1.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-05-23 21:27:43",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Parquery",
    "github_project": "lexery",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "lexery"
}
        
Elapsed time: 0.06675s