rebulk

Name	rebulk JSON
Version	3.2.0 JSON
	download
home_page	https://github.com/Toilal/rebulk/
Summary	Rebulk - Define simple search patterns in bulk to perform advanced matching on any string.
upload_time	2023-02-18 09:10:14
maintainer
docs_url	None
author	Rémi Alvergnat
requires_python
license	MIT
keywords	re regexp regular expression search pattern string match
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage

            ReBulk
======

[![Latest Version](http://img.shields.io/pypi/v/rebulk.svg)](https://pypi.python.org/pypi/rebulk)
[![MIT License](http://img.shields.io/badge/license-MIT-blue.svg)](https://pypi.python.org/pypi/rebulk)
[![Build Status](https://img.shields.io/github/workflow/status/Toilal/rebulk/ci)](https://github.com/Toilal/rebulk/actions?query=workflow%3Aci)
[![Coveralls](http://img.shields.io/coveralls/Toilal/rebulk.svg)](https://coveralls.io/r/Toilal/rebulk?branch=master)
[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/relekang/python-semantic-release)


ReBulk is a python library that performs advanced searches in strings
that would be hard to implement using [re
module](https://docs.python.org/3/library/re.html) or [String
methods](https://docs.python.org/3/library/stdtypes.html#str) only.

It includes some features like `Patterns`, `Match`, `Rule` that allows
developers to build a custom and complex string matcher using a readable
and extendable API.

This project is hosted on GitHub: <https://github.com/Toilal/rebulk>

Install
=======

```sh
$ pip install rebulk
```

Usage
=====

Regular expression, string and function based patterns are declared in a
`Rebulk` object. It use a fluent API to chain `string`, `regex`, and
`functional` methods to define various patterns types.

```python
>>> from rebulk import Rebulk
>>> bulk = Rebulk().string('brown').regex(r'qu\w+').functional(lambda s: (20, 25))
```

When `Rebulk` object is fully configured, you can call `matches` method
with an input string to retrieve all `Match` objects found by registered
pattern.

```python
>>> bulk.matches("The quick brown fox jumps over the lazy dog")
[<brown:(10, 15)>, <quick:(4, 9)>, <jumps:(20, 25)>]
```

If multiple `Match` objects are found at the same position, only the
longer one is kept.

```python
>>> bulk = Rebulk().string('lakers').string('la')
>>> bulk.matches("the lakers are from la")
[<lakers:(4, 10)>, <la:(20, 22)>]
```

String Patterns
===============

String patterns are based on
[str.find](https://docs.python.org/3/library/stdtypes.html#str.find)
method to find matches, but returns all matches in the string.
`ignore_case` can be enabled to ignore case.

```python
>>> Rebulk().string('la').matches("lalalilala")
[<la:(0, 2)>, <la:(2, 4)>, <la:(6, 8)>, <la:(8, 10)>]

>>> Rebulk().string('la').matches("LalAlilAla")
[<la:(8, 10)>]

>>> Rebulk().string('la', ignore_case=True).matches("LalAlilAla")
[<La:(0, 2)>, <lA:(2, 4)>, <lA:(6, 8)>, <la:(8, 10)>]
```

You can define several patterns with a single `string` method call.

```python
>>> Rebulk().string('Winter', 'coming').matches("Winter is coming...")
[<Winter:(0, 6)>, <coming:(10, 16)>]
```

Regular Expression Patterns
===========================

Regular Expression patterns are based on a compiled regular expression.
[re.finditer](https://docs.python.org/3/library/re.html#re.finditer)
method is used to find matches.

If [regex module](https://pypi.python.org/pypi/regex) is available, it
can be used by rebulk instead of default [re
module](https://docs.python.org/3/library/re.html). Enable it with `REBULK_REGEX_ENABLED=1` environment variable.

```python
>>> Rebulk().regex(r'l\w').matches("lolita")
[<lo:(0, 2)>, <li:(2, 4)>]
```

You can define several patterns with a single `regex` method call.

```python
>>> Rebulk().regex(r'Wint\wr', r'com\w{3}').matches("Winter is coming...")
[<Winter:(0, 6)>, <coming:(10, 16)>]
```

All keyword arguments from
[re.compile](https://docs.python.org/3/library/re.html#re.compile) are
supported.

```python
>>> import re  # import required for flags constant
>>> Rebulk().regex('L[A-Z]KERS', flags=re.IGNORECASE) \
...         .matches("The LaKeRs are from La")
[<LaKeRs:(4, 10)>]

>>> Rebulk().regex('L[A-Z]', 'L[A-Z]KERS', flags=re.IGNORECASE) \
...         .matches("The LaKeRs are from La")
[<La:(20, 22)>, <LaKeRs:(4, 10)>]

>>> Rebulk().regex(('L[A-Z]', re.IGNORECASE), ('L[a-z]KeRs')) \
...         .matches("The LaKeRs are from La")
[<La:(20, 22)>, <LaKeRs:(4, 10)>]
```

If [regex module](https://pypi.python.org/pypi/regex) is available, it
automatically supports repeated captures.

```python
>>> # If regex module is available, repeated_captures is True by default.
>>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+').matches("01-02-03-04")
>>> matches[0].children # doctest:+SKIP
[<01:(0, 2)>, <02:(3, 5)>, <03:(6, 8)>, <04:(9, 11)>]

>>> # If regex module is not available, or if repeated_captures is forced to False.
>>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+', repeated_captures=False) \
...                   .matches("01-02-03-04")
>>> matches[0].children
[<01:(0, 2)+initiator=01-02-03-04>, <04:(9, 11)+initiator=01-02-03-04>]
```

-   `abbreviations`

    Defined as a list of 2-tuple, each tuple is an abbreviation. It
    simply replace `tuple[0]` with `tuple[1]` in the expression.

    \>\>\> Rebulk().regex(r\'Custom-separators\',
    abbreviations=\[(\"-\", r\"\[W\_\]+\")\])\...
    .matches(\"Custom\_separators using-abbreviations\")
    \[\<Custom\_separators:(0, 17)\>\]

Functional Patterns
===================

Functional Patterns are based on the evaluation of a function.

The function should have the same parameters as `Rebulk.matches` method,
that is the input string, and must return at least start index and end
index of the `Match` object.

```python
>>> def func(string):
...     index = string.find('?')
...     if index > -1:
...         return 0, index - 11
>>> Rebulk().functional(func).matches("Why do simple ? Forget about it ...")
[<Why:(0, 3)>]
```

You can also return a dict of keywords arguments for `Match` object.

You can define several patterns with a single `functional` method call,
and function used can return multiple matches.

Chain Patterns
==============

Chain Patterns are ordered composition of string, functional and regex
patterns. Repeater can be set to define repetition on chain part.

```python
>>> r = Rebulk().regex_defaults(flags=re.IGNORECASE)\
...             .defaults(children=True, formatter={'episode': int, 'version': int})\
...             .chain()\
...             .regex(r'e(?P<episode>\d{1,4})').repeater(1)\
...             .regex(r'v(?P<version>\d+)').repeater('?')\
...             .regex(r'[ex-](?P<episode>\d{1,4})').repeater('*')\
...             .close() # .repeater(1) could be omitted as it's the default behavior
>>> r.matches("This is E14v2-15-16-17").to_dict()  # converts matches to dict
MatchesDict([('episode', [14, 15, 16, 17]), ('version', 2)])
```

Patterns parameters
===================

All patterns have options that can be given as keyword arguments.

-   `validator`

    Function to validate `Match` value given by the pattern. Can also be
    a `dict`, to use `validator` with pattern named with key.

    ```python
    >>> def check_leap_year(match):
    ...     return int(match.value) in [1980, 1984, 1988]
    >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
    ...                   .matches("In year 1982 ...")
    >>> len(matches)
    0
    >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
    ...                   .matches("In year 1984 ...")
    >>> len(matches)
    1
    ```

Some base validator functions are available in `rebulk.validators`
module. Most of those functions have to be configured using
`functools.partial` to map them to function accepting a single `match`
argument.

-   `formatter`

    Function to convert `Match` value given by the pattern. Can also be
    a `dict`, to use `formatter` with matches named with key.

    ```python
    >>> def year_formatter(value):
    ...     return int(value)
    >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
    ...                   .matches("In year 1982 ...")
    >>> isinstance(matches[0].value, int)
    True
    ```

-   `pre_match_processor` / `post_match_processor`

    Function to mutagen or invalidate a match generated by a pattern.

    Function has a single parameter which is the Match object. If
    function returns False, it will be considered as an invalid match.
    If function returns a match instance, it will replace the original
    match with this instance in the process.

-   `post_processor`

    Function to change the default output of the pattern. Function
    parameters are Matches list and Pattern object.

-   `name`

    The name of the pattern. It is automatically passed to `Match`
    objects generated by this pattern.

-   `tags`

    A list of string that qualifies this pattern.

-   `value`

    Override value property for generated `Match` objects. Can also be a
    `dict`, to use `value` with pattern named with key.

-   `validate_all`

    By default, validator is called for returned `Match` objects only.
    Enable this option to validate them all, parent and children
    included.

-   `format_all`

    By default, formatter is called for returned `Match` values only.
    Enable this option to format them all, parent and children included.

-   `disabled`

    A `function(context)` to disable the pattern if returning `True`.

-   `children`

    If `True`, all children `Match` objects will be retrieved instead of
    a single parent `Match` object.

-   `private`

    If `True`, `Match` objects generated from this pattern are available
    internally only. They will be removed at the end of `Rebulk.matches`
    method call.

-   `private_parent`

    Force parent matches to be returned and flag them as private.

-   `private_children`

    Force children matches to be returned and flag them as private.

-   `private_names`

    Matches names that will be declared as private

-   `ignore_names`

    Matches names that will be ignored from the pattern output, after
    validation.

-   `marker`

    If `true`, `Match` objects generated from this pattern will be
    markers matches instead of standard matches. They won\'t be included
    in `Matches` sequence, but will be available in `Matches.markers`
    sequence (see `Markers` section).

Match
=====

A `Match` object is the result created by a registered pattern.

It has a `value` property defined, and position indices are available
through `start`, `end` and `span` properties.

In some case, it contains children `Match` objects in `children`
property, and each child `Match` object reference its parent in `parent`
property. Also, a `name` property can be defined for the match.

If groups are defined in a Regular Expression pattern, each group match
will be converted to a single `Match` object. If a group has a name
defined (`(?P<name>group)`), it is set as `name` property in a child
`Match` object. The whole regexp match (`re.group(0)`) will be converted
to the main `Match` object, and all subgroups (1, 2, \... n) will be
converted to `children` matches of the main `Match` object.

```python
>>> matches = Rebulk() \
...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)") \
...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
>>> matches
[<One, 1, Two, 2, Three, 3:(9, 33)>]
>>> for child in matches[0].children:
...     '%s = %s' % (child.name, child.value)
'one = 1'
'two = 2'
'three = 3'
```

It\'s possible to retrieve only children by using `children` parameters.
You can also customize the way structure is generated with `every`,
`private_parent` and `private_children` parameters.

```python
>>> matches = Rebulk() \
...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)", children=True) \
...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
>>> matches
[<1:(14, 15)+name=one+initiator=One, 1, Two, 2, Three, 3>, <2:(22, 23)+name=two+initiator=One, 1, Two, 2, Three, 3>, <3:(32, 33)+name=three+initiator=One, 1, Two, 2, Three, 3>]
```

Match object has the following properties that can be given to Pattern
objects

-   `formatter`

    Function to convert `Match` value given by the pattern. Can also be
    a `dict`, to use `formatter` with matches named with key.

    ```python
    >>> def year_formatter(value):
    ...     return int(value)
    >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
    ...                   .matches("In year 1982 ...")
    >>> isinstance(matches[0].value, int)
    True
    ```

-   `format_all`

    By default, formatter is called for returned `Match` values only.
    Enable this option to format them all, parent and children included.

-   `conflict_solver`

    A `function(match, conflicting_match)` used to solve conflict.
    Returned object will be removed from matches by `ConflictSolver`
    default rule. If `__default__` string is returned, it will fallback
    to default behavior keeping longer match.

Matches
=======

A `Matches` object holds the result of `Rebulk.matches` method call.
It\'s a sequence of `Match` objects and it behaves like a list.

All methods accepts a `predicate` function to filter `Match` objects
using a callable, and an `index` int to retrieve a single element from
default returned matches.

It has the following additional methods and properties on it.

-   `starting(index, predicate=None, index=None)`

    Retrieves a list of `Match` objects that starts at given index.

-   `ending(index, predicate=None, index=None)`

    Retrieves a list of `Match` objects that ends at given index.

-   `previous(match, predicate=None, index=None)`

    Retrieves a list of `Match` objects that are previous and nearest to
    match.

-   `next(match, predicate=None, index=None)`

    Retrieves a list of `Match` objects that are next and nearest to
    match.

-   `tagged(tag, predicate=None, index=None)`

    Retrieves a list of `Match` objects that have the given tag defined.

-   `named(name, predicate=None, index=None)`

    Retrieves a list of `Match` objects that have the given name.

-   `range(start=0, end=None, predicate=None, index=None)`

    Retrieves a list of `Match` objects for given range, sorted from
    start to end.

-   `holes(start=0, end=None, formatter=None, ignore=None, predicate=None, index=None)`

    Retrieves a list of *hole* `Match` objects for given range. A hole
    match is created for each range where no match is available.

-   `conflicting(match, predicate=None, index=None)`

    Retrieves a list of `Match` objects that conflicts with given match.

-   `chain_before(self, position, seps, start=0, predicate=None, index=None)`:

    Retrieves a list of chained matches, before position, matching
    predicate and separated by characters from seps only.

-   `chain_after(self, position, seps, end=None, predicate=None, index=None)`:

    Retrieves a list of chained matches, after position, matching
    predicate and separated by characters from seps only.

-   `at_match(match, predicate=None, index=None)`

    Retrieves a list of `Match` objects at the same position as match.

-   `at_span(span, predicate=None, index=None)`

    Retrieves a list of `Match` objects from given (start, end) tuple.

-   `at_index(pos, predicate=None, index=None)`

    Retrieves a list of `Match` objects from given position.

-   `names`

    Retrieves a sequence of all `Match.name` properties.

-   `tags`

    Retrieves a sequence of all `Match.tags` properties.

-   `to_dict(details=False, first_value=False, enforce_list=False)`

    Convert to an ordered dict, with `Match.name` as key and
    `Match.value` as value.

    It\'s a subclass of
    [OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict),
    that contains a `matches` property which is a dict with `Match.name`
    as key and list of `Match` objects as value.

    If `first_value` is `True` and distinct values are found for the
    same name, value will be wrapped to a list. If `False`, first value
    only will be kept and values lists can be retrieved with
    `values_list` which is a dict with `Match.name` as key and list of
    `Match.value` as value.

    if `enforce_list` is `True`, all values will be wrapped to a list,
    even if a single value is found.

    If `details` is True, `Match.value` objects are replaced with
    complete `Match` object.

-   `markers`

    A custom `Matches` sequences specialized for `markers` matches (see
    below)

Markers
=======

If you have defined some patterns with `markers` property, then
`Matches.markers` points to a special `Matches` sequence that contains
only `markers` matches. This sequence supports all methods from
`Matches`.

Markers matches are not intended to be used in final result, but can be
used to implement a `Rule`.

Rules
=====

Rules are a convenient and readable way to implement advanced
conditional logic involving several `Match` objects. When a rule is
triggered, it can perform an action on `Matches` object, like filtering
out, adding additional tags or renaming.

Rules are implemented by extending the abstract `Rule` class. They are
registered using `Rebulk.rule` method by giving either a `Rule`
instance, a `Rule` class or a module containing `Rule classes` only.

For a rule to be triggered, `Rule.when` method must return `True`, or a
non empty list of `Match` objects, or any other truthy object. When
triggered, `Rule.then` method is called to perform the action with
`when_response` parameter defined as the response of `Rule.when` call.

Instead of implementing `Rule.then` method, you can define `consequence`
class property with a Consequence classe or instance, like
`RemoveMatch`, `RenameMatch` or `AppendMatch`. You can also use a list
of consequence when required : `when_response` must then be iterable,
and elements of this iterable will be given to each consequence in the
same order.

When many rules are registered, it can be useful to set `priority` class
variable to define a priority integer between all rule executions
(higher priorities will be executed first). You can also define
`dependency` to declare another Rule class as dependency for the current
rule, meaning that it will be executed before.

For all rules with the same `priority` value, `when` is called before,
and `then` is called after all.

```python
>>> from rebulk import Rule, RemoveMatch

>>> class FirstOnlyRule(Rule):
...     consequence = RemoveMatch
...
...     def when(self, matches, context):
...         grabbed = matches.named("grabbed", 0)
...         if grabbed and matches.previous(grabbed):
...             return grabbed

>>> rebulk = Rebulk()

>>> rebulk.regex("This match(.*?)grabbed", name="grabbed")
<...Rebulk object ...>
>>> rebulk.regex("if it's(.*?)first match", private=True)
<...Rebulk object at ...>
>>> rebulk.rules(FirstOnlyRule)
<...Rebulk object at ...>

>>> rebulk.matches("This match is grabbed only if it's the first match")
[<This match is grabbed:(0, 21)+name=grabbed>]
>>> rebulk.matches("if it's NOT the first match, This match is NOT grabbed")
[]
```


Changelog
=========

<!--next-version-placeholder-->

## v3.2.0 (2023-02-18)
### Feature
* **dependencies:** Add python 3.11 support and drop python 3.6 support ([`e4cb0d8`](https://github.com/Toilal/rebulk/commit/e4cb0d854cd8ea80da9abe46d2b3405a873e2020))

### Fix
* Remove pytest-runner from setup_requires ([`4483d17`](https://github.com/Toilal/rebulk/commit/4483d1777f6a61d20ed83da760663aec67e22042))

## v3.1.0 (2021-11-04)
### Feature
* **defaults:** Add overrides support ([#25](https://github.com/Toilal/rebulk/issues/25)) ([`f79e5ea`](https://github.com/Toilal/rebulk/commit/f79e5eab0806787ff19a4c668bf9f88413b67288))
* **python:** Add python 3.10 support, drop python 3.5 support ([`a5e6eb7`](https://github.com/Toilal/rebulk/commit/a5e6eb7bba979ee51e1c6c1e186bd224c989dfdc))

## v3.0.1 (2020-12-25)
### Fix
* **package:** Fix broken package `No such file or directory: 'CHANGELOG.md'` ([#24](https://github.com/Toilal/rebulk/issues/24)) ([`33895ff`](https://github.com/Toilal/rebulk/commit/33895ff358ff5051768fb98d4e840691e7af9bdf))

### Documentation
* **readme:** Add semantic release badge ([`78baca0`](https://github.com/Toilal/rebulk/commit/78baca0c529083d7f583ffec58aeb23734d67ce5))
* **readme:** Fix title ([`d5d4db5`](https://github.com/Toilal/rebulk/commit/d5d4db5cd7f6e2cb1308acd26bfb98838815fad4))

## v3.0.0 (2020-12-23)
### Feature
* **regex:** Replace REGEX_DISABLED environment variable with REBULK_REGEX_ENABLED ([`d5a8cad`](https://github.com/Toilal/rebulk/commit/d5a8cad6281533ee549a46ca70e1a25e5777eda3))
* Add python 3.8/3.9 support, drop python 2.7/3.4 support ([`048a15f`](https://github.com/Toilal/rebulk/commit/048a15f90833ba8d33ea84d56e9955d31b514dc3))

### Breaking
* regex module is now disabled by default, even if it's available in the python interpreter. You have to set REBULK_REGEX_ENABLED=1 in your environment to enable it, as this module may cause some issues.  ([`d5a8cad`](https://github.com/Toilal/rebulk/commit/d5a8cad6281533ee549a46ca70e1a25e5777eda3))
* Python 2.7 and 3.4 support have been dropped  ([`048a15f`](https://github.com/Toilal/rebulk/commit/048a15f90833ba8d33ea84d56e9955d31b514dc3))

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/Toilal/rebulk/",
    "name": "rebulk",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "re regexp regular expression search pattern string match",
    "author": "R\u00e9mi Alvergnat",
    "author_email": "toilal.dev@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/f2/06/24c69f8d707c9eefc1108a64e079da56b5f351e3f59ed76e8f04b9f3e296/rebulk-3.2.0.tar.gz",
    "platform": null,
    "description": "ReBulk\n======\n\n[![Latest Version](http://img.shields.io/pypi/v/rebulk.svg)](https://pypi.python.org/pypi/rebulk)\n[![MIT License](http://img.shields.io/badge/license-MIT-blue.svg)](https://pypi.python.org/pypi/rebulk)\n[![Build Status](https://img.shields.io/github/workflow/status/Toilal/rebulk/ci)](https://github.com/Toilal/rebulk/actions?query=workflow%3Aci)\n[![Coveralls](http://img.shields.io/coveralls/Toilal/rebulk.svg)](https://coveralls.io/r/Toilal/rebulk?branch=master)\n[![semantic-release](https://img.shields.io/badge/%20%20%F0%9F%93%A6%F0%9F%9A%80-semantic--release-e10079.svg)](https://github.com/relekang/python-semantic-release)\n\n\nReBulk is a python library that performs advanced searches in strings\nthat would be hard to implement using [re\nmodule](https://docs.python.org/3/library/re.html) or [String\nmethods](https://docs.python.org/3/library/stdtypes.html#str) only.\n\nIt includes some features like `Patterns`, `Match`, `Rule` that allows\ndevelopers to build a custom and complex string matcher using a readable\nand extendable API.\n\nThis project is hosted on GitHub: <https://github.com/Toilal/rebulk>\n\nInstall\n=======\n\n```sh\n$ pip install rebulk\n```\n\nUsage\n=====\n\nRegular expression, string and function based patterns are declared in a\n`Rebulk` object. It use a fluent API to chain `string`, `regex`, and\n`functional` methods to define various patterns types.\n\n```python\n>>> from rebulk import Rebulk\n>>> bulk = Rebulk().string('brown').regex(r'qu\\w+').functional(lambda s: (20, 25))\n```\n\nWhen `Rebulk` object is fully configured, you can call `matches` method\nwith an input string to retrieve all `Match` objects found by registered\npattern.\n\n```python\n>>> bulk.matches(\"The quick brown fox jumps over the lazy dog\")\n[<brown:(10, 15)>, <quick:(4, 9)>, <jumps:(20, 25)>]\n```\n\nIf multiple `Match` objects are found at the same position, only the\nlonger one is kept.\n\n```python\n>>> bulk = Rebulk().string('lakers').string('la')\n>>> bulk.matches(\"the lakers are from la\")\n[<lakers:(4, 10)>, <la:(20, 22)>]\n```\n\nString Patterns\n===============\n\nString patterns are based on\n[str.find](https://docs.python.org/3/library/stdtypes.html#str.find)\nmethod to find matches, but returns all matches in the string.\n`ignore_case` can be enabled to ignore case.\n\n```python\n>>> Rebulk().string('la').matches(\"lalalilala\")\n[<la:(0, 2)>, <la:(2, 4)>, <la:(6, 8)>, <la:(8, 10)>]\n\n>>> Rebulk().string('la').matches(\"LalAlilAla\")\n[<la:(8, 10)>]\n\n>>> Rebulk().string('la', ignore_case=True).matches(\"LalAlilAla\")\n[<La:(0, 2)>, <lA:(2, 4)>, <lA:(6, 8)>, <la:(8, 10)>]\n```\n\nYou can define several patterns with a single `string` method call.\n\n```python\n>>> Rebulk().string('Winter', 'coming').matches(\"Winter is coming...\")\n[<Winter:(0, 6)>, <coming:(10, 16)>]\n```\n\nRegular Expression Patterns\n===========================\n\nRegular Expression patterns are based on a compiled regular expression.\n[re.finditer](https://docs.python.org/3/library/re.html#re.finditer)\nmethod is used to find matches.\n\nIf [regex module](https://pypi.python.org/pypi/regex) is available, it\ncan be used by rebulk instead of default [re\nmodule](https://docs.python.org/3/library/re.html). Enable it with `REBULK_REGEX_ENABLED=1` environment variable.\n\n```python\n>>> Rebulk().regex(r'l\\w').matches(\"lolita\")\n[<lo:(0, 2)>, <li:(2, 4)>]\n```\n\nYou can define several patterns with a single `regex` method call.\n\n```python\n>>> Rebulk().regex(r'Wint\\wr', r'com\\w{3}').matches(\"Winter is coming...\")\n[<Winter:(0, 6)>, <coming:(10, 16)>]\n```\n\nAll keyword arguments from\n[re.compile](https://docs.python.org/3/library/re.html#re.compile) are\nsupported.\n\n```python\n>>> import re  # import required for flags constant\n>>> Rebulk().regex('L[A-Z]KERS', flags=re.IGNORECASE) \\\n...         .matches(\"The LaKeRs are from La\")\n[<LaKeRs:(4, 10)>]\n\n>>> Rebulk().regex('L[A-Z]', 'L[A-Z]KERS', flags=re.IGNORECASE) \\\n...         .matches(\"The LaKeRs are from La\")\n[<La:(20, 22)>, <LaKeRs:(4, 10)>]\n\n>>> Rebulk().regex(('L[A-Z]', re.IGNORECASE), ('L[a-z]KeRs')) \\\n...         .matches(\"The LaKeRs are from La\")\n[<La:(20, 22)>, <LaKeRs:(4, 10)>]\n```\n\nIf [regex module](https://pypi.python.org/pypi/regex) is available, it\nautomatically supports repeated captures.\n\n```python\n>>> # If regex module is available, repeated_captures is True by default.\n>>> matches = Rebulk().regex(r'(\\d+)(?:-(\\d+))+').matches(\"01-02-03-04\")\n>>> matches[0].children # doctest:+SKIP\n[<01:(0, 2)>, <02:(3, 5)>, <03:(6, 8)>, <04:(9, 11)>]\n\n>>> # If regex module is not available, or if repeated_captures is forced to False.\n>>> matches = Rebulk().regex(r'(\\d+)(?:-(\\d+))+', repeated_captures=False) \\\n...                   .matches(\"01-02-03-04\")\n>>> matches[0].children\n[<01:(0, 2)+initiator=01-02-03-04>, <04:(9, 11)+initiator=01-02-03-04>]\n```\n\n-   `abbreviations`\n\n    Defined as a list of 2-tuple, each tuple is an abbreviation. It\n    simply replace `tuple[0]` with `tuple[1]` in the expression.\n\n    \\>\\>\\> Rebulk().regex(r\\'Custom-separators\\',\n    abbreviations=\\[(\\\"-\\\", r\\\"\\[W\\_\\]+\\\")\\])\\...\n    .matches(\\\"Custom\\_separators using-abbreviations\\\")\n    \\[\\<Custom\\_separators:(0, 17)\\>\\]\n\nFunctional Patterns\n===================\n\nFunctional Patterns are based on the evaluation of a function.\n\nThe function should have the same parameters as `Rebulk.matches` method,\nthat is the input string, and must return at least start index and end\nindex of the `Match` object.\n\n```python\n>>> def func(string):\n...     index = string.find('?')\n...     if index > -1:\n...         return 0, index - 11\n>>> Rebulk().functional(func).matches(\"Why do simple ? Forget about it ...\")\n[<Why:(0, 3)>]\n```\n\nYou can also return a dict of keywords arguments for `Match` object.\n\nYou can define several patterns with a single `functional` method call,\nand function used can return multiple matches.\n\nChain Patterns\n==============\n\nChain Patterns are ordered composition of string, functional and regex\npatterns. Repeater can be set to define repetition on chain part.\n\n```python\n>>> r = Rebulk().regex_defaults(flags=re.IGNORECASE)\\\n...             .defaults(children=True, formatter={'episode': int, 'version': int})\\\n...             .chain()\\\n...             .regex(r'e(?P<episode>\\d{1,4})').repeater(1)\\\n...             .regex(r'v(?P<version>\\d+)').repeater('?')\\\n...             .regex(r'[ex-](?P<episode>\\d{1,4})').repeater('*')\\\n...             .close() # .repeater(1) could be omitted as it's the default behavior\n>>> r.matches(\"This is E14v2-15-16-17\").to_dict()  # converts matches to dict\nMatchesDict([('episode', [14, 15, 16, 17]), ('version', 2)])\n```\n\nPatterns parameters\n===================\n\nAll patterns have options that can be given as keyword arguments.\n\n-   `validator`\n\n    Function to validate `Match` value given by the pattern. Can also be\n    a `dict`, to use `validator` with pattern named with key.\n\n    ```python\n    >>> def check_leap_year(match):\n    ...     return int(match.value) in [1980, 1984, 1988]\n    >>> matches = Rebulk().regex(r'\\d{4}', validator=check_leap_year) \\\n    ...                   .matches(\"In year 1982 ...\")\n    >>> len(matches)\n    0\n    >>> matches = Rebulk().regex(r'\\d{4}', validator=check_leap_year) \\\n    ...                   .matches(\"In year 1984 ...\")\n    >>> len(matches)\n    1\n    ```\n\nSome base validator functions are available in `rebulk.validators`\nmodule. Most of those functions have to be configured using\n`functools.partial` to map them to function accepting a single `match`\nargument.\n\n-   `formatter`\n\n    Function to convert `Match` value given by the pattern. Can also be\n    a `dict`, to use `formatter` with matches named with key.\n\n    ```python\n    >>> def year_formatter(value):\n    ...     return int(value)\n    >>> matches = Rebulk().regex(r'\\d{4}', formatter=year_formatter) \\\n    ...                   .matches(\"In year 1982 ...\")\n    >>> isinstance(matches[0].value, int)\n    True\n    ```\n\n-   `pre_match_processor` / `post_match_processor`\n\n    Function to mutagen or invalidate a match generated by a pattern.\n\n    Function has a single parameter which is the Match object. If\n    function returns False, it will be considered as an invalid match.\n    If function returns a match instance, it will replace the original\n    match with this instance in the process.\n\n-   `post_processor`\n\n    Function to change the default output of the pattern. Function\n    parameters are Matches list and Pattern object.\n\n-   `name`\n\n    The name of the pattern. It is automatically passed to `Match`\n    objects generated by this pattern.\n\n-   `tags`\n\n    A list of string that qualifies this pattern.\n\n-   `value`\n\n    Override value property for generated `Match` objects. Can also be a\n    `dict`, to use `value` with pattern named with key.\n\n-   `validate_all`\n\n    By default, validator is called for returned `Match` objects only.\n    Enable this option to validate them all, parent and children\n    included.\n\n-   `format_all`\n\n    By default, formatter is called for returned `Match` values only.\n    Enable this option to format them all, parent and children included.\n\n-   `disabled`\n\n    A `function(context)` to disable the pattern if returning `True`.\n\n-   `children`\n\n    If `True`, all children `Match` objects will be retrieved instead of\n    a single parent `Match` object.\n\n-   `private`\n\n    If `True`, `Match` objects generated from this pattern are available\n    internally only. They will be removed at the end of `Rebulk.matches`\n    method call.\n\n-   `private_parent`\n\n    Force parent matches to be returned and flag them as private.\n\n-   `private_children`\n\n    Force children matches to be returned and flag them as private.\n\n-   `private_names`\n\n    Matches names that will be declared as private\n\n-   `ignore_names`\n\n    Matches names that will be ignored from the pattern output, after\n    validation.\n\n-   `marker`\n\n    If `true`, `Match` objects generated from this pattern will be\n    markers matches instead of standard matches. They won\\'t be included\n    in `Matches` sequence, but will be available in `Matches.markers`\n    sequence (see `Markers` section).\n\nMatch\n=====\n\nA `Match` object is the result created by a registered pattern.\n\nIt has a `value` property defined, and position indices are available\nthrough `start`, `end` and `span` properties.\n\nIn some case, it contains children `Match` objects in `children`\nproperty, and each child `Match` object reference its parent in `parent`\nproperty. Also, a `name` property can be defined for the match.\n\nIf groups are defined in a Regular Expression pattern, each group match\nwill be converted to a single `Match` object. If a group has a name\ndefined (`(?P<name>group)`), it is set as `name` property in a child\n`Match` object. The whole regexp match (`re.group(0)`) will be converted\nto the main `Match` object, and all subgroups (1, 2, \\... n) will be\nconverted to `children` matches of the main `Match` object.\n\n```python\n>>> matches = Rebulk() \\\n...         .regex(r\"One, (?P<one>\\w+), Two, (?P<two>\\w+), Three, (?P<three>\\w+)\") \\\n...         .matches(\"Zero, 0, One, 1, Two, 2, Three, 3, Four, 4\")\n>>> matches\n[<One, 1, Two, 2, Three, 3:(9, 33)>]\n>>> for child in matches[0].children:\n...     '%s = %s' % (child.name, child.value)\n'one = 1'\n'two = 2'\n'three = 3'\n```\n\nIt\\'s possible to retrieve only children by using `children` parameters.\nYou can also customize the way structure is generated with `every`,\n`private_parent` and `private_children` parameters.\n\n```python\n>>> matches = Rebulk() \\\n...         .regex(r\"One, (?P<one>\\w+), Two, (?P<two>\\w+), Three, (?P<three>\\w+)\", children=True) \\\n...         .matches(\"Zero, 0, One, 1, Two, 2, Three, 3, Four, 4\")\n>>> matches\n[<1:(14, 15)+name=one+initiator=One, 1, Two, 2, Three, 3>, <2:(22, 23)+name=two+initiator=One, 1, Two, 2, Three, 3>, <3:(32, 33)+name=three+initiator=One, 1, Two, 2, Three, 3>]\n```\n\nMatch object has the following properties that can be given to Pattern\nobjects\n\n-   `formatter`\n\n    Function to convert `Match` value given by the pattern. Can also be\n    a `dict`, to use `formatter` with matches named with key.\n\n    ```python\n    >>> def year_formatter(value):\n    ...     return int(value)\n    >>> matches = Rebulk().regex(r'\\d{4}', formatter=year_formatter) \\\n    ...                   .matches(\"In year 1982 ...\")\n    >>> isinstance(matches[0].value, int)\n    True\n    ```\n\n-   `format_all`\n\n    By default, formatter is called for returned `Match` values only.\n    Enable this option to format them all, parent and children included.\n\n-   `conflict_solver`\n\n    A `function(match, conflicting_match)` used to solve conflict.\n    Returned object will be removed from matches by `ConflictSolver`\n    default rule. If `__default__` string is returned, it will fallback\n    to default behavior keeping longer match.\n\nMatches\n=======\n\nA `Matches` object holds the result of `Rebulk.matches` method call.\nIt\\'s a sequence of `Match` objects and it behaves like a list.\n\nAll methods accepts a `predicate` function to filter `Match` objects\nusing a callable, and an `index` int to retrieve a single element from\ndefault returned matches.\n\nIt has the following additional methods and properties on it.\n\n-   `starting(index, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that starts at given index.\n\n-   `ending(index, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that ends at given index.\n\n-   `previous(match, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that are previous and nearest to\n    match.\n\n-   `next(match, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that are next and nearest to\n    match.\n\n-   `tagged(tag, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that have the given tag defined.\n\n-   `named(name, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that have the given name.\n\n-   `range(start=0, end=None, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects for given range, sorted from\n    start to end.\n\n-   `holes(start=0, end=None, formatter=None, ignore=None, predicate=None, index=None)`\n\n    Retrieves a list of *hole* `Match` objects for given range. A hole\n    match is created for each range where no match is available.\n\n-   `conflicting(match, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects that conflicts with given match.\n\n-   `chain_before(self, position, seps, start=0, predicate=None, index=None)`:\n\n    Retrieves a list of chained matches, before position, matching\n    predicate and separated by characters from seps only.\n\n-   `chain_after(self, position, seps, end=None, predicate=None, index=None)`:\n\n    Retrieves a list of chained matches, after position, matching\n    predicate and separated by characters from seps only.\n\n-   `at_match(match, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects at the same position as match.\n\n-   `at_span(span, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects from given (start, end) tuple.\n\n-   `at_index(pos, predicate=None, index=None)`\n\n    Retrieves a list of `Match` objects from given position.\n\n-   `names`\n\n    Retrieves a sequence of all `Match.name` properties.\n\n-   `tags`\n\n    Retrieves a sequence of all `Match.tags` properties.\n\n-   `to_dict(details=False, first_value=False, enforce_list=False)`\n\n    Convert to an ordered dict, with `Match.name` as key and\n    `Match.value` as value.\n\n    It\\'s a subclass of\n    [OrderedDict](https://docs.python.org/2/library/collections.html#collections.OrderedDict),\n    that contains a `matches` property which is a dict with `Match.name`\n    as key and list of `Match` objects as value.\n\n    If `first_value` is `True` and distinct values are found for the\n    same name, value will be wrapped to a list. If `False`, first value\n    only will be kept and values lists can be retrieved with\n    `values_list` which is a dict with `Match.name` as key and list of\n    `Match.value` as value.\n\n    if `enforce_list` is `True`, all values will be wrapped to a list,\n    even if a single value is found.\n\n    If `details` is True, `Match.value` objects are replaced with\n    complete `Match` object.\n\n-   `markers`\n\n    A custom `Matches` sequences specialized for `markers` matches (see\n    below)\n\nMarkers\n=======\n\nIf you have defined some patterns with `markers` property, then\n`Matches.markers` points to a special `Matches` sequence that contains\nonly `markers` matches. This sequence supports all methods from\n`Matches`.\n\nMarkers matches are not intended to be used in final result, but can be\nused to implement a `Rule`.\n\nRules\n=====\n\nRules are a convenient and readable way to implement advanced\nconditional logic involving several `Match` objects. When a rule is\ntriggered, it can perform an action on `Matches` object, like filtering\nout, adding additional tags or renaming.\n\nRules are implemented by extending the abstract `Rule` class. They are\nregistered using `Rebulk.rule` method by giving either a `Rule`\ninstance, a `Rule` class or a module containing `Rule classes` only.\n\nFor a rule to be triggered, `Rule.when` method must return `True`, or a\nnon empty list of `Match` objects, or any other truthy object. When\ntriggered, `Rule.then` method is called to perform the action with\n`when_response` parameter defined as the response of `Rule.when` call.\n\nInstead of implementing `Rule.then` method, you can define `consequence`\nclass property with a Consequence classe or instance, like\n`RemoveMatch`, `RenameMatch` or `AppendMatch`. You can also use a list\nof consequence when required : `when_response` must then be iterable,\nand elements of this iterable will be given to each consequence in the\nsame order.\n\nWhen many rules are registered, it can be useful to set `priority` class\nvariable to define a priority integer between all rule executions\n(higher priorities will be executed first). You can also define\n`dependency` to declare another Rule class as dependency for the current\nrule, meaning that it will be executed before.\n\nFor all rules with the same `priority` value, `when` is called before,\nand `then` is called after all.\n\n```python\n>>> from rebulk import Rule, RemoveMatch\n\n>>> class FirstOnlyRule(Rule):\n...     consequence = RemoveMatch\n...\n...     def when(self, matches, context):\n...         grabbed = matches.named(\"grabbed\", 0)\n...         if grabbed and matches.previous(grabbed):\n...             return grabbed\n\n>>> rebulk = Rebulk()\n\n>>> rebulk.regex(\"This match(.*?)grabbed\", name=\"grabbed\")\n<...Rebulk object ...>\n>>> rebulk.regex(\"if it's(.*?)first match\", private=True)\n<...Rebulk object at ...>\n>>> rebulk.rules(FirstOnlyRule)\n<...Rebulk object at ...>\n\n>>> rebulk.matches(\"This match is grabbed only if it's the first match\")\n[<This match is grabbed:(0, 21)+name=grabbed>]\n>>> rebulk.matches(\"if it's NOT the first match, This match is NOT grabbed\")\n[]\n```\n\n\nChangelog\n=========\n\n<!--next-version-placeholder-->\n\n## v3.2.0 (2023-02-18)\n### Feature\n* **dependencies:** Add python 3.11 support and drop python 3.6 support ([`e4cb0d8`](https://github.com/Toilal/rebulk/commit/e4cb0d854cd8ea80da9abe46d2b3405a873e2020))\n\n### Fix\n* Remove pytest-runner from setup_requires ([`4483d17`](https://github.com/Toilal/rebulk/commit/4483d1777f6a61d20ed83da760663aec67e22042))\n\n## v3.1.0 (2021-11-04)\n### Feature\n* **defaults:** Add overrides support ([#25](https://github.com/Toilal/rebulk/issues/25)) ([`f79e5ea`](https://github.com/Toilal/rebulk/commit/f79e5eab0806787ff19a4c668bf9f88413b67288))\n* **python:** Add python 3.10 support, drop python 3.5 support ([`a5e6eb7`](https://github.com/Toilal/rebulk/commit/a5e6eb7bba979ee51e1c6c1e186bd224c989dfdc))\n\n## v3.0.1 (2020-12-25)\n### Fix\n* **package:** Fix broken package `No such file or directory: 'CHANGELOG.md'` ([#24](https://github.com/Toilal/rebulk/issues/24)) ([`33895ff`](https://github.com/Toilal/rebulk/commit/33895ff358ff5051768fb98d4e840691e7af9bdf))\n\n### Documentation\n* **readme:** Add semantic release badge ([`78baca0`](https://github.com/Toilal/rebulk/commit/78baca0c529083d7f583ffec58aeb23734d67ce5))\n* **readme:** Fix title ([`d5d4db5`](https://github.com/Toilal/rebulk/commit/d5d4db5cd7f6e2cb1308acd26bfb98838815fad4))\n\n## v3.0.0 (2020-12-23)\n### Feature\n* **regex:** Replace REGEX_DISABLED environment variable with REBULK_REGEX_ENABLED ([`d5a8cad`](https://github.com/Toilal/rebulk/commit/d5a8cad6281533ee549a46ca70e1a25e5777eda3))\n* Add python 3.8/3.9 support, drop python 2.7/3.4 support ([`048a15f`](https://github.com/Toilal/rebulk/commit/048a15f90833ba8d33ea84d56e9955d31b514dc3))\n\n### Breaking\n* regex module is now disabled by default, even if it's available in the python interpreter. You have to set REBULK_REGEX_ENABLED=1 in your environment to enable it, as this module may cause some issues.  ([`d5a8cad`](https://github.com/Toilal/rebulk/commit/d5a8cad6281533ee549a46ca70e1a25e5777eda3))\n* Python 2.7 and 3.4 support have been dropped  ([`048a15f`](https://github.com/Toilal/rebulk/commit/048a15f90833ba8d33ea84d56e9955d31b514dc3))\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Rebulk - Define simple search patterns in bulk to perform advanced matching on any string.",
    "version": "3.2.0",
    "project_urls": {
        "Download": "https://pypi.python.org/packages/source/r/rebulk/rebulk-3.2.0.tar.gz",
        "Homepage": "https://github.com/Toilal/rebulk/"
    },
    "split_keywords": [
        "re",
        "regexp",
        "regular",
        "expression",
        "search",
        "pattern",
        "string",
        "match"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "844ddf073d593f7e7e4a5a7e19148b2e9b4ae63b4ddcbb863f1e7bb2b6f19c62",
                "md5": "4c0c99c7bc592964d05a46a32bcc0a19",
                "sha256": "6bc31ae4b37200623c5827d2f539f9ec3e52b50431322dad8154642a39b0a53e"
            },
            "downloads": -1,
            "filename": "rebulk-3.2.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "4c0c99c7bc592964d05a46a32bcc0a19",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 56298,
            "upload_time": "2023-02-18T09:10:12",
            "upload_time_iso_8601": "2023-02-18T09:10:12.435298Z",
            "url": "https://files.pythonhosted.org/packages/84/4d/df073d593f7e7e4a5a7e19148b2e9b4ae63b4ddcbb863f1e7bb2b6f19c62/rebulk-3.2.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "f20624c69f8d707c9eefc1108a64e079da56b5f351e3f59ed76e8f04b9f3e296",
                "md5": "e2c88915303b311cea24b200ab332375",
                "sha256": "0d30bf80fca00fa9c697185ac475daac9bde5f646ce3338c9ff5d5dc1ebdfebc"
            },
            "downloads": -1,
            "filename": "rebulk-3.2.0.tar.gz",
            "has_sig": false,
            "md5_digest": "e2c88915303b311cea24b200ab332375",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 261685,
            "upload_time": "2023-02-18T09:10:14",
            "upload_time_iso_8601": "2023-02-18T09:10:14.378143Z",
            "url": "https://files.pythonhosted.org/packages/f2/06/24c69f8d707c9eefc1108a64e079da56b5f351e3f59ed76e8f04b9f3e296/rebulk-3.2.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-02-18 09:10:14",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Toilal",
    "github_project": "rebulk",
    "travis_ci": false,
    "coveralls": true,
    "github_actions": true,
    "requirements": [],
    "tox": true,
    "lcname": "rebulk"
}

Rémi Alvergnat