| Name | PyPcre JSON |
| Version |
0.2.4
JSON |
| download |
| home_page | None |
| Summary | Modern, GIL-friendly, Fast Python bindings for PCRE2 with auto caching and JIT of compiled patterns. |
| upload_time | 2025-10-20 10:17:54 |
| maintainer | None |
| docs_url | None |
| author | None |
| requires_python | >=3.9 |
| license | None |
| keywords |
regex
pcre2
bindings
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
<!--
# SPDX-FileCopyrightText: 2025 ModelCloud.ai
# SPDX-FileCopyrightText: 2025 qubitium@modelcloud.ai
# SPDX-License-Identifier: Apache-2.0
# Contact: qubitium@modelcloud.ai, x.com/qubitium
-->
# PyPcre (Python Pcre2 Binding)
Modern `nogil` Python bindings for the Pcre2 library with `stdlib.re` api compatibility.
<p align="center">
<a href="https://github.com/ModelCloud/PyPcre/releases" style="text-decoration:none;"><img alt="GitHub release" src="https://img.shields.io/github/release/ModelCloud/Pcre.svg"></a>
<a href="https://pypi.org/project/PyPcre/" style="text-decoration:none;"><img alt="PyPI - Version" src="https://img.shields.io/pypi/v/PyPcre"></a>
<!-- <a href="https://pepy.tech/projects/PyPcre" style="text-decoration:none;"><img src="https://static.pepy.tech/badge/PyPcre" alt="PyPI Downloads"></a> -->
<a href="https://github.com/ModelCloud/PyPcre/blob/main/LICENSE"><img src="https://img.shields.io/pypi/l/PyPcre"></a>
<a href="https://huggingface.co/modelcloud/"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-ModelCloud-%23ff8811.svg"></a>
</p>
## Latest News
* 10/12/2025 [0.2.3](https://github.com/ModelCloud/PyPcre/releases/tag/v0.2.3): 🤗 Full `GIL=0` compliance for Python >= 3.13T. Reduced cache thread contention. Improved performance for all api. Expanded ci testing coverage. FreeBSD, Solaris, and Windows compatibility validated.
* 10/09/2025 [0.1.0](https://github.com/ModelCloud/PyPcre/releases/tag/v0.1.0): 🎉 First release. Thread safe, auto JIT, auto pattern caching and optimistic linking to system library for fast install.
## Why PyPcre:
PyPcre is a modern Pcre2 binding designed to be both super fast and thread-safe in the `GIL=0` world. In the old days of global interpreter locks, Python had real threads but mostly fake concurrency (with the exception of some low-level apis and packages). In 2025, Python is moving toward full `GIl=0` design which will unlock true multi-threaded concurrency and finally bring Python in parity with other modern languages.
Many Python regular expression packages will either out-right segfault due to safety under `GIL=0` or suffer sub-optimal performance due to non-threaded design mindset.
PyPcre is fully ci tested where every single api and Pcre2 flag is tested in a continuous development environment backed by the ModelCloud.AI team. Fuzz (clobber) tests are also performed to catch any memory safety, accuracy, or memory leak regressions.
Safety first: PyPcre will optimistically link to the os provided `libpcre2` package for maximum safetey since PyPcre will automatically enjoy upstream security patches. You can force full source compile via `PYPCRE_BUILD_FROM_SOURCE=1` env toggle.
## Installation
```bash
pip install PyPcre
```
The package prioritizes linking against the `libpcre2-8` shared library in system for fast install and max security protection which gets latest patches from OS. See [Building](#building) for manual build details.
## Platform Support (Validated):
`Linux`, `MacOS`, `Windows`, `WSL`, `FreeBSD`, `Solaris`
## Usage
If you already rely on the standard library `re`, migrating is as
simple as changing your import:
```python
import pcre as re
```
The module-level entry points (`match`, `search`, `fullmatch`, `findall`,
`finditer`, `split`, `sub`, `subn`, `compile`, `escape`, `purge`) expose the
same call signatures as their `re` counterparts, making existing code work
unchanged. Every standard flag with a PCRE2 equivalent—`IGNORECASE`,
`MULTILINE`, `DOTALL`, `VERBOSE`, `ASCII`, and friends—is supported via the
re-exported constants and the `pcre.Flag` enum.
### Sample Usage
```python
from pcre import match, search, findall, compile, Flag
if match(r"(?P<word>\\w+)", "hello world"):
print("found word")
pattern = compile(rb"\d+", flags=Flag.MULTILINE)
numbers = pattern.findall(b"line 1\nline 22")
```
`pcre` mirrors the core helpers from Python’s standard library `re` module
`match`, `search`, `fullmatch`, `finditer`, `findall`, and `compile` while
exposing PCRE2’s extended flag set through the Pythonic `Flag` enum
(`Flag.CASELESS`, `Flag.MULTILINE`, `Flag.UTF`, ...).
### Stdlib `re` compatibility
- Module-level helpers and the `Pattern` class follow the same call shapes as
the standard library `re` module, including `pos`, `endpos`, and `flags`
behaviour.
- `Pattern` mirrors `re.Pattern` attributes like `.pattern`, `.groupindex`,
and `.groups`, while `Match` objects surface the familiar `.re`, `.string`,
`.pos`, `.endpos`, `.lastindex`, `.lastgroup`, `.regs`, and `.expand()` API.
- Substitution helpers enforce the same type rules as the standard library
`re` module: string patterns require string replacements, byte patterns
require bytes-like replacements, and callable replacements receive the
wrapped `Match`.
- `compile()` accepts native `Flag` values as well as compatible
`re.RegexFlag` members from the standard library. Supported stdlib flags
map 1:1 to PCRE2 options (`IGNORECASE→CASELESS`, `MULTILINE→MULTILINE`,
`DOTALL→DOTALL`, `VERBOSE→EXTENDED`); passing unsupported stdlib flags
raises a compatibility `ValueError` to prevent silent divergences.
- `pcre.escape()` delegates directly to `re.escape` for byte and text
patterns so escaping semantics remain identical.
### `regex` package compatibility
The [`regex`](https://pypi.org/project/regex/) package interprets
`\uXXXX` and `\UXXXXXXXX` escapes as UTF-8 code points, while PCRE2 expects
hexadecimal escapes to use the `\x{...}` form. Enable `Flag.COMPAT_UNICODE_ESCAPE` to
translate those escapes automatically when compiling patterns:
```python
from pcre import compile, Flag
pattern = compile(r"\\U0001F600", flags=Flag.COMPAT_UNICODE_ESCAPE)
assert pattern.pattern == r"\\x{0001F600}"
```
Set the default behaviour globally with `pcre.configure(compat_regex=True)`
so that subsequent calls to `compile()` and the module-level helpers apply
the conversion without repeating the flag.
### Automatic pattern caching
`pcre.compile()` caches the final `Pattern` wrapper for up to 128
unique `(pattern, flags)` pairs when the pattern object is hashable. By default
the cache is **thread-local**, keeping per-thread LRU stores so workers do not
contend with one another. Adjust the capacity with `pcre.set_cache_limit(n)`—pass
`0` to disable caching completely or `None` for an unlimited cache—and check the
current limit with `pcre.get_cache_limit()`. The cache can be emptied at any time
with `pcre.clear_cache()`.
Applications that prefer the historic global cache can opt back in before any
compilation takes place by setting `PYPCRE_CACHE_PATTERN_GLOBAL=1` in the
environment **before importing** `pcre`. Runtime switching is no longer
supported; altering the value after patterns have been compiled raises
`RuntimeError`.
### Text versus bytes defaults
String patterns follow the same defaults as Python’s `re` module,
automatically enabling the `Flag.UTF` and `Flag.UCP` options so Unicode
pattern and character semantics “just work.” Byte patterns remain raw by
default—neither option is activated—so you retain full control over
binary-oriented matching. Explicitly set `Flag.NO_UTF`/`Flag.NO_UCP` if you
need to opt out for strings, or add the UTF/UCP flags yourself when compiling
bytes.
### Working with compiled patterns
- `compile()` accepts either a pattern literal or an existing `Pattern`
instance, making it easy to mix compiled objects with the convenience
helpers.
- `Pattern.match/search/fullmatch/finditer/findall` accept optional
`pos`, `endpos`, and `options` arguments, mirroring the standard library
`re` module while letting you thread PCRE2 execution flags through
individual calls.
### Threaded execution
- `pcre.parallel_map()` fans out work across a shared thread pool for
`match`, `search`, `fullmatch`, and `findall`. The helper preserves the
order of the provided subjects and returns the same result objects you’d
normally receive from the `Pattern` methods.
- The threaded backend activates only on machines with at least eight CPU
cores; otherwise execution falls back to the sequential path regardless of
flags or configuration.
- Threading is **opt-in by default** when Python runs without the GIL
(e.g. Python with `-X gil=0` or `PYTHON_GIL=0`). When the GIL is active the default falls
back to sequential execution to avoid needless overhead.
- With auto threading enabled (`configure_threads(enabled=True)`), the pool
is only engaged when at least one subject is larger than the configured
threshold (60 kB by default). Smaller jobs run sequentially to avoid the
cost of thread hand-offs; adjust the boundary via
`configure_threads(threshold=...)`.
- Use `Flag.THREADS` to force threaded execution for a specific pattern or
`Flag.NO_THREADS` to lock it to sequential mode regardless of global
settings.
- `pcre.configure_thread_pool(max_workers=...)` controls the size of the
shared executor (capped to half the available CPUs); call it with
`preload=True` to spin the pool up eagerly, and `shutdown_thread_pool()`
to tear it down manually if needed.
### Performance considerations
- **Precompile for hot loops.** The module-level helpers mirror the `re`
API and route through the shared compilation cache, but the extra call
plumbing still adds overhead. With a simple pattern like `"fo"`, using
the low-level `pcre_ext_c.Pattern` directly costs ~0.60 µs per call,
whereas the high-level `pcre.match()` helper lands at ~4.4 µs per call
under the same workload. For sustained loops, create a `Pattern` object
once and reuse it.
- **Benchmark toggles.** The extension defaults to the fastest safe
configuration, but you can flip individual knobs back to the legacy
behaviour by setting environment variables *before* importing `pcre`:
| Env var | Effect (per-call, `pattern.match("fo")`) |
|--------------------------------|------------------------------------------|
| _(baseline)_ | 0.60 µs |
| `PYPCRE_DISABLE_CONTEXT_CACHE=1` | 0.60 µs |
| `PYPCRE_FORCE_JIT_LOCK=1` | 0.60 µs |
| `pcre.match()` helper | 4.43 µs |
The toggles reintroduce the legacy GIL hand-off, per-call match-context
allocation, and explicit locks so you can quantify the impact of each
optimisation on your workload. Measurements were taken on CPython 3.14 (rc3)
with 200 000 evaluations of `pcre_ext_c.compile("fo").match("foobar")`; absolute
values will vary by platform, but the relative differences are
representative. Leave the variables unset in production to keep the new fast
paths active.
### JIT Pattern Compilation and Execution
Pcre2’s JIT compiler is enabled by default for every compiled pattern. The
wrapper exposes two complementary ways to adjust that behaviour:
- Toggle the global default at runtime with `pcre.configure(jit=False)` to
turn JIT off (call `pcre.configure(jit=True)` to turn it back on).
- Override the default per pattern using the Python-only flags `Flag.JIT`
and `Flag.NO_JIT`:
```python
from pcre import compile, configure, Flag
configure(jit=False) # disable JIT globally
baseline = compile(r"expr") # JIT disabled
fast = compile(r"expr", flags=Flag.JIT) # force-enable for this pattern
slow = compile(r"expr", flags=Flag.NO_JIT) # force-disable for this pattern
```
## Pattern cache
- `pcre.compile()` caches hashable `(pattern, flags)` pairs, keeping up to 128 entries per thread by default.
- Set `PYPCRE_CACHE_PATTERN_GLOBAL=1` before importing `pcre` if you need a shared, process-wide cache instead of isolated thread stores.
- Use `pcre.clear_cache()` when you need to free the active cache proactively.
- Non-hashable pattern objects skip the cache and are compiled each time.
## Default flags for text patterns
- String patterns enable `Flag.UTF` and `Flag.UCP` automatically so behaviour matches `re`.
- Byte patterns keep both flags disabled; opt in manually if Unicode semantics are desired.
- Explicitly supply `Flag.NO_UTF`/`Flag.NO_UCP` to override the defaults for strings.
## Additional usage notes
- All top-level helpers (`match`, `search`, `fullmatch`, `finditer`, `findall`) defer to the cached compiler.
- Compiled `Pattern` objects expose `.pattern`, `.flags`, `.jit`, and `.groupindex` for introspection.
- Execution helpers accept `pos`, `endpos`, and `options`, allowing you to thread PCRE2 execution flags per call.
## Memory allocation
- By default PyPcre uses CPython's `PyMem` allocator.
- Override the allocator explicitly by setting `PYPCRE_ALLOCATOR` to one of
`pymem`, `malloc`, `jemalloc`, or `tcmalloc` before importing the module. The
optional allocators are still loaded with `dlopen`, so no additional link
flags are required when they are absent.
- Call `pcre_ext_c.get_allocator()` to inspect which backend is active at
runtime.
## Building
The extension links against an existing PCRE2 installation (the `libpcre2-8`
variant). Install the development headers for your platform before building,
for example `apt install libpcre2-dev` on Debian/Ubuntu, `dnf install pcre2-devel`
on Fedora/RHEL derivatives, or `brew install pcre2` on macOS.
If the headers or library live in a non-standard location you can export one
or more of the following environment variables prior to invoking the build
(`pip install .`, `python -m build`, etc.):
- `PYPCRE_ROOT`
- `PYPCRE_INCLUDE_DIR`
- `PYPCRE_LIBRARY_DIR`
- `PYPCRE_LIBRARY_PATH` *(pathsep-separated directories or explicit library files to
prioritise when resolving `libpcre2-8`)*
- `PYPCRE_LIBRARIES`
- `PYPCRE_CFLAGS`
- `PYPCRE_LDFLAGS`
When `pkg-config` is available the build will automatically pick up the
required include and link flags via `pkg-config --cflags/--libs libpcre2-8`.
Without `pkg-config`, the build script scans common installation prefixes for
Linux distributions (Debian, Ubuntu, Fedora/RHEL/CentOS, openSUSE, Alpine),
FreeBSD, macOS (including Homebrew), and Solaris to locate the headers and
libraries.
If your system ships `libpcre2-8` under `/usr` but you also maintain a
manually built copy under `/usr/local`, export `PYPCRE_LIBRARY_PATH` (and, if
needed, a matching `PYPCRE_INCLUDE_DIR`) so the build links against the desired
location.
Raw data
{
"_id": null,
"home_page": null,
"name": "PyPcre",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "regex, pcre2, bindings",
"author": null,
"author_email": "ModelCloud <qubitium@modelcloud.ai>",
"download_url": "https://files.pythonhosted.org/packages/c3/a1/06b32119511976266c41e05575719d1e7f4c7eee57ab49e891f2199fd6fb/pypcre-0.2.4.tar.gz",
"platform": null,
"description": "<!--\n# SPDX-FileCopyrightText: 2025 ModelCloud.ai\n# SPDX-FileCopyrightText: 2025 qubitium@modelcloud.ai\n# SPDX-License-Identifier: Apache-2.0\n# Contact: qubitium@modelcloud.ai, x.com/qubitium\n-->\n\n# PyPcre (Python Pcre2 Binding)\n\nModern `nogil` Python bindings for the Pcre2 library with `stdlib.re` api compatibility.\n\n<p align=\"center\">\n <a href=\"https://github.com/ModelCloud/PyPcre/releases\" style=\"text-decoration:none;\"><img alt=\"GitHub release\" src=\"https://img.shields.io/github/release/ModelCloud/Pcre.svg\"></a>\n <a href=\"https://pypi.org/project/PyPcre/\" style=\"text-decoration:none;\"><img alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/PyPcre\"></a>\n <!-- <a href=\"https://pepy.tech/projects/PyPcre\" style=\"text-decoration:none;\"><img src=\"https://static.pepy.tech/badge/PyPcre\" alt=\"PyPI Downloads\"></a> -->\n <a href=\"https://github.com/ModelCloud/PyPcre/blob/main/LICENSE\"><img src=\"https://img.shields.io/pypi/l/PyPcre\"></a>\n <a href=\"https://huggingface.co/modelcloud/\"><img src=\"https://img.shields.io/badge/\ud83e\udd17%20Hugging%20Face-ModelCloud-%23ff8811.svg\"></a>\n</p>\n\n## Latest News\n* 10/12/2025 [0.2.3](https://github.com/ModelCloud/PyPcre/releases/tag/v0.2.3): \ud83e\udd17 Full `GIL=0` compliance for Python >= 3.13T. Reduced cache thread contention. Improved performance for all api. Expanded ci testing coverage. FreeBSD, Solaris, and Windows compatibility validated.\n* 10/09/2025 [0.1.0](https://github.com/ModelCloud/PyPcre/releases/tag/v0.1.0): \ud83c\udf89 First release. Thread safe, auto JIT, auto pattern caching and optimistic linking to system library for fast install.\n\n## Why PyPcre:\n\nPyPcre is a modern Pcre2 binding designed to be both super fast and thread-safe in the `GIL=0` world. In the old days of global interpreter locks, Python had real threads but mostly fake concurrency (with the exception of some low-level apis and packages). In 2025, Python is moving toward full `GIl=0` design which will unlock true multi-threaded concurrency and finally bring Python in parity with other modern languages. \n\nMany Python regular expression packages will either out-right segfault due to safety under `GIL=0` or suffer sub-optimal performance due to non-threaded design mindset. \n\nPyPcre is fully ci tested where every single api and Pcre2 flag is tested in a continuous development environment backed by the ModelCloud.AI team. Fuzz (clobber) tests are also performed to catch any memory safety, accuracy, or memory leak regressions. \n\nSafety first: PyPcre will optimistically link to the os provided `libpcre2` package for maximum safetey since PyPcre will automatically enjoy upstream security patches. You can force full source compile via `PYPCRE_BUILD_FROM_SOURCE=1` env toggle.\n\n## Installation\n\n```bash\npip install PyPcre\n```\n\nThe package prioritizes linking against the `libpcre2-8` shared library in system for fast install and max security protection which gets latest patches from OS. See [Building](#building) for manual build details.\n\n## Platform Support (Validated):\n\n`Linux`, `MacOS`, `Windows`, `WSL`, `FreeBSD`, `Solaris`\n\n\n## Usage\n\n\nIf you already rely on the standard library `re`, migrating is as\nsimple as changing your import:\n\n```python\nimport pcre as re\n```\n\nThe module-level entry points (`match`, `search`, `fullmatch`, `findall`,\n`finditer`, `split`, `sub`, `subn`, `compile`, `escape`, `purge`) expose the\nsame call signatures as their `re` counterparts, making existing code work\nunchanged. Every standard flag with a PCRE2 equivalent\u2014`IGNORECASE`,\n`MULTILINE`, `DOTALL`, `VERBOSE`, `ASCII`, and friends\u2014is supported via the\nre-exported constants and the `pcre.Flag` enum. \n\n### Sample Usage\n\n```python\nfrom pcre import match, search, findall, compile, Flag\n\nif match(r\"(?P<word>\\\\w+)\", \"hello world\"):\n print(\"found word\")\n\npattern = compile(rb\"\\d+\", flags=Flag.MULTILINE)\nnumbers = pattern.findall(b\"line 1\\nline 22\")\n```\n\n`pcre` mirrors the core helpers from Python\u2019s standard library `re` module \n`match`, `search`, `fullmatch`, `finditer`, `findall`, and `compile` while\nexposing PCRE2\u2019s extended flag set through the Pythonic `Flag` enum\n(`Flag.CASELESS`, `Flag.MULTILINE`, `Flag.UTF`, ...).\n\n### Stdlib `re` compatibility\n\n- Module-level helpers and the `Pattern` class follow the same call shapes as\n the standard library `re` module, including `pos`, `endpos`, and `flags`\n behaviour.\n- `Pattern` mirrors `re.Pattern` attributes like `.pattern`, `.groupindex`,\n and `.groups`, while `Match` objects surface the familiar `.re`, `.string`,\n `.pos`, `.endpos`, `.lastindex`, `.lastgroup`, `.regs`, and `.expand()` API.\n- Substitution helpers enforce the same type rules as the standard library\n `re` module: string patterns require string replacements, byte patterns\n require bytes-like replacements, and callable replacements receive the\n wrapped `Match`.\n- `compile()` accepts native `Flag` values as well as compatible\n `re.RegexFlag` members from the standard library. Supported stdlib flags\n map 1:1 to PCRE2 options (`IGNORECASE\u2192CASELESS`, `MULTILINE\u2192MULTILINE`,\n `DOTALL\u2192DOTALL`, `VERBOSE\u2192EXTENDED`); passing unsupported stdlib flags\n raises a compatibility `ValueError` to prevent silent divergences.\n- `pcre.escape()` delegates directly to `re.escape` for byte and text\n patterns so escaping semantics remain identical.\n\n### `regex` package compatibility\n\nThe [`regex`](https://pypi.org/project/regex/) package interprets\n`\\uXXXX` and `\\UXXXXXXXX` escapes as UTF-8 code points, while PCRE2 expects\nhexadecimal escapes to use the `\\x{...}` form. Enable `Flag.COMPAT_UNICODE_ESCAPE` to\ntranslate those escapes automatically when compiling patterns:\n\n```python\nfrom pcre import compile, Flag\n\npattern = compile(r\"\\\\U0001F600\", flags=Flag.COMPAT_UNICODE_ESCAPE)\nassert pattern.pattern == r\"\\\\x{0001F600}\"\n```\n\nSet the default behaviour globally with `pcre.configure(compat_regex=True)`\nso that subsequent calls to `compile()` and the module-level helpers apply\nthe conversion without repeating the flag.\n\n### Automatic pattern caching\n\n`pcre.compile()` caches the final `Pattern` wrapper for up to 128\nunique `(pattern, flags)` pairs when the pattern object is hashable. By default\nthe cache is **thread-local**, keeping per-thread LRU stores so workers do not\ncontend with one another. Adjust the capacity with `pcre.set_cache_limit(n)`\u2014pass\n`0` to disable caching completely or `None` for an unlimited cache\u2014and check the\ncurrent limit with `pcre.get_cache_limit()`. The cache can be emptied at any time\nwith `pcre.clear_cache()`.\n\nApplications that prefer the historic global cache can opt back in before any\ncompilation takes place by setting `PYPCRE_CACHE_PATTERN_GLOBAL=1` in the\nenvironment **before importing** `pcre`. Runtime switching is no longer\nsupported; altering the value after patterns have been compiled raises\n`RuntimeError`.\n\n### Text versus bytes defaults\n\nString patterns follow the same defaults as Python\u2019s `re` module,\nautomatically enabling the `Flag.UTF` and `Flag.UCP` options so Unicode\npattern and character semantics \u201cjust work.\u201d Byte patterns remain raw by\ndefault\u2014neither option is activated\u2014so you retain full control over\nbinary-oriented matching. Explicitly set `Flag.NO_UTF`/`Flag.NO_UCP` if you\nneed to opt out for strings, or add the UTF/UCP flags yourself when compiling\nbytes.\n\n### Working with compiled patterns\n\n- `compile()` accepts either a pattern literal or an existing `Pattern`\n instance, making it easy to mix compiled objects with the convenience\n helpers.\n- `Pattern.match/search/fullmatch/finditer/findall` accept optional\n `pos`, `endpos`, and `options` arguments, mirroring the standard library\n `re` module while letting you thread PCRE2 execution flags through\n individual calls.\n\n### Threaded execution\n\n- `pcre.parallel_map()` fans out work across a shared thread pool for\n `match`, `search`, `fullmatch`, and `findall`. The helper preserves the\n order of the provided subjects and returns the same result objects you\u2019d\n normally receive from the `Pattern` methods.\n- The threaded backend activates only on machines with at least eight CPU\n cores; otherwise execution falls back to the sequential path regardless of\n flags or configuration.\n- Threading is **opt-in by default** when Python runs without the GIL\n (e.g. Python with `-X gil=0` or `PYTHON_GIL=0`). When the GIL is active the default falls\n back to sequential execution to avoid needless overhead.\n- With auto threading enabled (`configure_threads(enabled=True)`), the pool\n is only engaged when at least one subject is larger than the configured\n threshold (60\u202fkB by default). Smaller jobs run sequentially to avoid the\n cost of thread hand-offs; adjust the boundary via\n `configure_threads(threshold=...)`.\n- Use `Flag.THREADS` to force threaded execution for a specific pattern or\n `Flag.NO_THREADS` to lock it to sequential mode regardless of global\n settings.\n- `pcre.configure_thread_pool(max_workers=...)` controls the size of the\n shared executor (capped to half the available CPUs); call it with\n `preload=True` to spin the pool up eagerly, and `shutdown_thread_pool()`\n to tear it down manually if needed.\n\n### Performance considerations\n\n- **Precompile for hot loops.** The module-level helpers mirror the `re`\n API and route through the shared compilation cache, but the extra call\n plumbing still adds overhead. With a simple pattern like `\"fo\"`, using\n the low-level `pcre_ext_c.Pattern` directly costs ~0.60\u202f\u00b5s per call,\n whereas the high-level `pcre.match()` helper lands at ~4.4\u202f\u00b5s per call\n under the same workload. For sustained loops, create a `Pattern` object\n once and reuse it.\n- **Benchmark toggles.** The extension defaults to the fastest safe\n configuration, but you can flip individual knobs back to the legacy\n behaviour by setting environment variables *before* importing `pcre`:\n\n | Env var | Effect (per-call, `pattern.match(\"fo\")`) |\n |--------------------------------|------------------------------------------|\n | _(baseline)_ | 0.60\u202f\u00b5s |\n | `PYPCRE_DISABLE_CONTEXT_CACHE=1` | 0.60\u202f\u00b5s |\n | `PYPCRE_FORCE_JIT_LOCK=1` | 0.60\u202f\u00b5s |\n | `pcre.match()` helper | 4.43\u202f\u00b5s |\n\n The toggles reintroduce the legacy GIL hand-off, per-call match-context\n allocation, and explicit locks so you can quantify the impact of each\n optimisation on your workload. Measurements were taken on CPython 3.14 (rc3)\n with 200\u202f000 evaluations of `pcre_ext_c.compile(\"fo\").match(\"foobar\")`; absolute\n values will vary by platform, but the relative differences are\n representative. Leave the variables unset in production to keep the new fast\n paths active.\n\n### JIT Pattern Compilation and Execution\n\nPcre2\u2019s JIT compiler is enabled by default for every compiled pattern. The\nwrapper exposes two complementary ways to adjust that behaviour:\n\n- Toggle the global default at runtime with `pcre.configure(jit=False)` to\n turn JIT off (call `pcre.configure(jit=True)` to turn it back on).\n- Override the default per pattern using the Python-only flags `Flag.JIT`\n and `Flag.NO_JIT`:\n\n ```python\n from pcre import compile, configure, Flag\n\n configure(jit=False) # disable JIT globally\n baseline = compile(r\"expr\") # JIT disabled\n\n fast = compile(r\"expr\", flags=Flag.JIT) # force-enable for this pattern\n slow = compile(r\"expr\", flags=Flag.NO_JIT) # force-disable for this pattern\n ```\n\n## Pattern cache\n- `pcre.compile()` caches hashable `(pattern, flags)` pairs, keeping up to 128 entries per thread by default.\n- Set `PYPCRE_CACHE_PATTERN_GLOBAL=1` before importing `pcre` if you need a shared, process-wide cache instead of isolated thread stores.\n- Use `pcre.clear_cache()` when you need to free the active cache proactively.\n- Non-hashable pattern objects skip the cache and are compiled each time.\n\n## Default flags for text patterns\n- String patterns enable `Flag.UTF` and `Flag.UCP` automatically so behaviour matches `re`.\n- Byte patterns keep both flags disabled; opt in manually if Unicode semantics are desired.\n- Explicitly supply `Flag.NO_UTF`/`Flag.NO_UCP` to override the defaults for strings.\n\n## Additional usage notes\n- All top-level helpers (`match`, `search`, `fullmatch`, `finditer`, `findall`) defer to the cached compiler.\n- Compiled `Pattern` objects expose `.pattern`, `.flags`, `.jit`, and `.groupindex` for introspection.\n- Execution helpers accept `pos`, `endpos`, and `options`, allowing you to thread PCRE2 execution flags per call.\n\n## Memory allocation\n- By default PyPcre uses CPython's `PyMem` allocator.\n- Override the allocator explicitly by setting `PYPCRE_ALLOCATOR` to one of\n `pymem`, `malloc`, `jemalloc`, or `tcmalloc` before importing the module. The\n optional allocators are still loaded with `dlopen`, so no additional link\n flags are required when they are absent.\n- Call `pcre_ext_c.get_allocator()` to inspect which backend is active at\n runtime.\n\n## Building\n\nThe extension links against an existing PCRE2 installation (the `libpcre2-8`\nvariant). Install the development headers for your platform before building,\nfor example `apt install libpcre2-dev` on Debian/Ubuntu, `dnf install pcre2-devel`\non Fedora/RHEL derivatives, or `brew install pcre2` on macOS.\n\nIf the headers or library live in a non-standard location you can export one\nor more of the following environment variables prior to invoking the build\n(`pip install .`, `python -m build`, etc.):\n\n- `PYPCRE_ROOT`\n- `PYPCRE_INCLUDE_DIR`\n- `PYPCRE_LIBRARY_DIR`\n- `PYPCRE_LIBRARY_PATH` *(pathsep-separated directories or explicit library files to\n prioritise when resolving `libpcre2-8`)*\n- `PYPCRE_LIBRARIES`\n- `PYPCRE_CFLAGS`\n- `PYPCRE_LDFLAGS`\n\nWhen `pkg-config` is available the build will automatically pick up the\nrequired include and link flags via `pkg-config --cflags/--libs libpcre2-8`.\nWithout `pkg-config`, the build script scans common installation prefixes for\nLinux distributions (Debian, Ubuntu, Fedora/RHEL/CentOS, openSUSE, Alpine),\nFreeBSD, macOS (including Homebrew), and Solaris to locate the headers and\nlibraries.\n\nIf your system ships `libpcre2-8` under `/usr` but you also maintain a\nmanually built copy under `/usr/local`, export `PYPCRE_LIBRARY_PATH` (and, if\nneeded, a matching `PYPCRE_INCLUDE_DIR`) so the build links against the desired\nlocation.\n",
"bugtrack_url": null,
"license": null,
"summary": "Modern, GIL-friendly, Fast Python bindings for PCRE2 with auto caching and JIT of compiled patterns.",
"version": "0.2.4",
"project_urls": {
"Homepage": "https://github.com/ModelCloud/pcre"
},
"split_keywords": [
"regex",
" pcre2",
" bindings"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "c3a106b32119511976266c41e05575719d1e7f4c7eee57ab49e891f2199fd6fb",
"md5": "05660241d8c624016b9071583ab2d470",
"sha256": "c29c0aa8b62d45379cc924be7d186d02ade62ec02730c93dbe9118f9f3482003"
},
"downloads": -1,
"filename": "pypcre-0.2.4.tar.gz",
"has_sig": false,
"md5_digest": "05660241d8c624016b9071583ab2d470",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 116333,
"upload_time": "2025-10-20T10:17:54",
"upload_time_iso_8601": "2025-10-20T10:17:54.274055Z",
"url": "https://files.pythonhosted.org/packages/c3/a1/06b32119511976266c41e05575719d1e7f4c7eee57ab49e891f2199fd6fb/pypcre-0.2.4.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-20 10:17:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "ModelCloud",
"github_project": "pcre",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "pypcre"
}