Lexery
======
.. image:: https://github.com/Parquery/lexery/actions/workflows/ci.yml/badge.svg
:target: https://github.com/Parquery/lexery/actions/workflows/ci.yml
:alt: Continuous integration
.. image:: https://coveralls.io/repos/github/Parquery/lexery/badge.svg?branch=master
:target: https://coveralls.io/github/Parquery/lexery?branch=master
:alt: Coverage
.. image:: https://badge.fury.io/py/lexery.svg
:target: https://pypi.org/project/lexery/
:alt: PyPI - version
.. image:: https://img.shields.io/pypi/pyversions/lexery.svg
:target: https://pypi.org/project/lexery/
:alt: PyPI - Python Version
A simple lexer based on regular expressions.
Inspired by https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript
Usage
=====
You define the lexing rules and lexery matches them iteratively as a look-up:
.. code-block:: python
>>> import lexery
>>> import re
>>> text = 'crop \t ( 20, 30, 40, 10 ) ;'
>>>
>>> lexer = lexery.Lexer(
... rules=[
... lexery.Rule(identifier='identifier',
... pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
... lexery.Rule(identifier='lpar', pattern=re.compile(r'\(')),
... lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
... lexery.Rule(identifier='rpar', pattern=re.compile(r'\)')),
... lexery.Rule(identifier='comma', pattern=re.compile(r',')),
... lexery.Rule(identifier='semi', pattern=re.compile(r';'))
... ],
... skip_whitespace=True)
>>> tokens = lexer.lex(text=text)
>>> assert tokens == [[
... lexery.Token('identifier', 'crop', 0, 0),
... lexery.Token('lpar', '(', 9, 0),
... lexery.Token('number', '20', 11, 0),
... lexery.Token('comma', ',', 13, 0),
... lexery.Token('number', '30', 15, 0),
... lexery.Token('comma', ',', 17, 0),
... lexery.Token('number', '40', 19, 0),
... lexery.Token('comma', ',', 21, 0),
... lexery.Token('number', '10', 23, 0),
... lexery.Token('rpar', ')', 26, 0),
... lexery.Token('semi', ';', 28, 0)]]
Mind that if a part of the text can not be matched, a ``lexery.Error`` is raised:
.. code-block:: python
>>> import lexery
>>> import re
>>> text = 'some-identifier ( 23 )'
>>>
>>> lexer = lexery.Lexer(
... rules=[
... lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
... lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
... ],
... skip_whitespace=True)
>>> tokens = lexer.lex(text=text)
Traceback (most recent call last):
...
lexery.Error: Unmatched text at line 0 and position 4:
some-identifier ( 23 )
^
If you specify an ``unmatched_identifier``, all the unmatched characters are accumulated in tokens with that identifier:
.. code-block:: python
>>> import lexery
>>> import re
>>> text = 'some-identifier ( 23 )-'
>>>
>>> lexer = lexery.Lexer(
... rules=[
... lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),
... lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),
... ],
... skip_whitespace=True,
... unmatched_identifier='unmatched')
>>> tokens = lexer.lex(text=text)
>>> assert tokens == [[
... lexery.Token('identifier', 'some', 0, 0),
... lexery.Token('unmatched', '-', 4, 0),
... lexery.Token('identifier', 'identifier', 5, 0),
... lexery.Token('unmatched', '(', 16, 0),
... lexery.Token('number', '23', 18, 0),
... lexery.Token('unmatched', ')-', 21, 0)]]
Installation
============
* Install lexery with pip:
.. code-block:: bash
pip3 install lexery
Development
===========
* Check out the repository.
* In the repository root, create the virtual environment:
.. code-block:: bash
python3 -m venv venv3
* Activate the virtual environment:
.. code-block:: bash
source venv3/bin/activate
* Install the development dependencies:
.. code-block:: bash
pip3 install -e .[dev]
Pre-commit Checks
-----------------
We provide a set of pre-commit checks that run unit tests, lint and check code for formatting.
Namely, we use:
* `yapf <https://github.com/google/yapf>`_ to check the formatting.
* The style of the docstrings is checked with `pydocstyle <https://github.com/PyCQA/pydocstyle>`_.
* Static type analysis is performed with `mypy <http://mypy-lang.org/>`_.
* Various linter checks are done with `pylint <https://www.pylint.org/>`_.
Run the pre-commit checks locally from an activated virtual environment with development dependencies:
.. code-block:: bash
./precommit.py
* The pre-commit script can also automatically format the code:
.. code-block:: bash
./precommit.py --overwrite
Versioning
==========
We follow `Semantic Versioning <http://semver.org/spec/v1.0.0.html>`_. The version X.Y.Z indicates:
* X is the major version (backward-incompatible),
* Y is the minor version (backward-compatible), and
* Z is the patch version (backward-compatible bug fix).
Raw data
{
"_id": null,
"home_page": "https://github.com/Parquery/lexery",
"name": "lexery",
"maintainer": "",
"docs_url": null,
"requires_python": "",
"maintainer_email": "",
"keywords": "lexer regexp regular expression",
"author": "Marko Ristin",
"author_email": "marko@ristin.ch",
"download_url": "https://files.pythonhosted.org/packages/95/94/33271bf8cdd0de9fb8c210e95b472d2978f48f444bce795cf2e08db04b2c/lexery-1.2.0.tar.gz",
"platform": null,
"description": "Lexery\n======\n\n.. image:: https://github.com/Parquery/lexery/actions/workflows/ci.yml/badge.svg\n :target: https://github.com/Parquery/lexery/actions/workflows/ci.yml\n :alt: Continuous integration\n\n.. image:: https://coveralls.io/repos/github/Parquery/lexery/badge.svg?branch=master\n :target: https://coveralls.io/github/Parquery/lexery?branch=master\n :alt: Coverage\n\n.. image:: https://badge.fury.io/py/lexery.svg\n :target: https://pypi.org/project/lexery/\n :alt: PyPI - version\n\n.. image:: https://img.shields.io/pypi/pyversions/lexery.svg\n :target: https://pypi.org/project/lexery/\n :alt: PyPI - Python Version\n\nA simple lexer based on regular expressions.\n\nInspired by https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript\n\nUsage\n=====\nYou define the lexing rules and lexery matches them iteratively as a look-up:\n\n.. code-block:: python\n\n >>> import lexery\n >>> import re\n >>> text = 'crop \\t ( 20, 30, 40, 10 ) ;'\n >>>\n >>> lexer = lexery.Lexer(\n ... rules=[\n ... lexery.Rule(identifier='identifier',\n ... pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),\n ... lexery.Rule(identifier='lpar', pattern=re.compile(r'\\(')),\n ... lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),\n ... lexery.Rule(identifier='rpar', pattern=re.compile(r'\\)')),\n ... lexery.Rule(identifier='comma', pattern=re.compile(r',')),\n ... lexery.Rule(identifier='semi', pattern=re.compile(r';'))\n ... ],\n ... skip_whitespace=True)\n >>> tokens = lexer.lex(text=text)\n >>> assert tokens == [[\n ... lexery.Token('identifier', 'crop', 0, 0), \n ... lexery.Token('lpar', '(', 9, 0),\n ... lexery.Token('number', '20', 11, 0),\n ... lexery.Token('comma', ',', 13, 0),\n ... lexery.Token('number', '30', 15, 0),\n ... lexery.Token('comma', ',', 17, 0),\n ... lexery.Token('number', '40', 19, 0),\n ... lexery.Token('comma', ',', 21, 0),\n ... lexery.Token('number', '10', 23, 0),\n ... lexery.Token('rpar', ')', 26, 0),\n ... lexery.Token('semi', ';', 28, 0)]]\n\nMind that if a part of the text can not be matched, a ``lexery.Error`` is raised:\n\n.. code-block:: python\n\n >>> import lexery\n >>> import re\n >>> text = 'some-identifier ( 23 )'\n >>>\n >>> lexer = lexery.Lexer(\n ... rules=[\n ... lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),\n ... lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),\n ... ],\n ... skip_whitespace=True)\n >>> tokens = lexer.lex(text=text)\n Traceback (most recent call last):\n ...\n lexery.Error: Unmatched text at line 0 and position 4:\n some-identifier ( 23 )\n ^\n\nIf you specify an ``unmatched_identifier``, all the unmatched characters are accumulated in tokens with that identifier:\n\n.. code-block:: python\n\n >>> import lexery\n >>> import re\n >>> text = 'some-identifier ( 23 )-'\n >>>\n >>> lexer = lexery.Lexer(\n ... rules=[\n ... lexery.Rule(identifier='identifier', pattern=re.compile(r'[a-zA-Z_][a-zA-Z_]*')),\n ... lexery.Rule(identifier='number', pattern=re.compile(r'[1-9][0-9]*')),\n ... ],\n ... skip_whitespace=True,\n ... unmatched_identifier='unmatched')\n >>> tokens = lexer.lex(text=text)\n >>> assert tokens == [[\n ... lexery.Token('identifier', 'some', 0, 0),\n ... lexery.Token('unmatched', '-', 4, 0),\n ... lexery.Token('identifier', 'identifier', 5, 0),\n ... lexery.Token('unmatched', '(', 16, 0),\n ... lexery.Token('number', '23', 18, 0),\n ... lexery.Token('unmatched', ')-', 21, 0)]]\n\n\nInstallation\n============\n\n* Install lexery with pip:\n\n.. code-block:: bash\n\n pip3 install lexery\n\nDevelopment\n===========\n\n* Check out the repository.\n\n* In the repository root, create the virtual environment:\n\n.. code-block:: bash\n\n python3 -m venv venv3\n\n* Activate the virtual environment:\n\n.. code-block:: bash\n\n source venv3/bin/activate\n\n* Install the development dependencies:\n\n.. code-block:: bash\n\n pip3 install -e .[dev]\n\nPre-commit Checks\n-----------------\nWe provide a set of pre-commit checks that run unit tests, lint and check code for formatting.\n\nNamely, we use:\n\n* `yapf <https://github.com/google/yapf>`_ to check the formatting.\n* The style of the docstrings is checked with `pydocstyle <https://github.com/PyCQA/pydocstyle>`_.\n* Static type analysis is performed with `mypy <http://mypy-lang.org/>`_.\n* Various linter checks are done with `pylint <https://www.pylint.org/>`_.\n\nRun the pre-commit checks locally from an activated virtual environment with development dependencies:\n\n.. code-block:: bash\n\n ./precommit.py\n\n* The pre-commit script can also automatically format the code:\n\n.. code-block:: bash\n\n ./precommit.py --overwrite\n\n\nVersioning\n==========\nWe follow `Semantic Versioning <http://semver.org/spec/v1.0.0.html>`_. The version X.Y.Z indicates:\n\n* X is the major version (backward-incompatible),\n* Y is the minor version (backward-compatible), and\n* Z is the patch version (backward-compatible bug fix).\n",
"bugtrack_url": null,
"license": "License :: OSI Approved :: MIT License",
"summary": "A simple lexer based on regular expressions",
"version": "1.2.0",
"project_urls": {
"Homepage": "https://github.com/Parquery/lexery"
},
"split_keywords": [
"lexer",
"regexp",
"regular",
"expression"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "959433271bf8cdd0de9fb8c210e95b472d2978f48f444bce795cf2e08db04b2c",
"md5": "0f10c9ba9f913cd7868118f4766496ae",
"sha256": "6acc7af2b33c788dffd461004e441c87fc3d5d2cc5b3f8dc031c522201910068"
},
"downloads": -1,
"filename": "lexery-1.2.0.tar.gz",
"has_sig": false,
"md5_digest": "0f10c9ba9f913cd7868118f4766496ae",
"packagetype": "sdist",
"python_version": "source",
"requires_python": null,
"size": 5254,
"upload_time": "2023-05-23T21:27:43",
"upload_time_iso_8601": "2023-05-23T21:27:43.667157Z",
"url": "https://files.pythonhosted.org/packages/95/94/33271bf8cdd0de9fb8c210e95b472d2978f48f444bce795cf2e08db04b2c/lexery-1.2.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2023-05-23 21:27:43",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Parquery",
"github_project": "lexery",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "lexery"
}