# `log-surgeon-ffi`
`log-surgeon-ffi` provides Python foreign function interface (FFI) bindings for
[`log-surgeon`](https://github.com/y-scope/log-surgeon).
---
## Quick navigation
[**Overview**](#overview)
* [Why `log-surgeon`?](#why-log-surgeon)
* [Key capabilities](#key-capabilities)
* [Structured output and downstream capabilities](#structured-output-and-downstream-capabilities)
* [When to use `log-surgeon`](#when-to-use-log-surgeon)
[**Getting started**](#getting-started)
* [System requirements](#system-requirements)
* [Installation](#installation)
* [First steps](#first-steps)
* [Important prerequisites](#important-prerequisites)
* [Quick start examples](#quick-start-examples)
[**Key concepts**](#key-concepts)
* [Token-based parsing and delimiters](#token-based-parsing-and-delimiters)
* [Named capture groups](#named-capture-groups)
* [Using raw f-strings for regex patterns](#using-raw-f-strings-for-regex-patterns)
[**Reference**](#reference)
* [Parser API](#parser)
* [Query API](#query)
* [PATTERN constants](#pattern)
[**Development**](#development)
* [Building from source](#building-from-source)
* [Running tests](#running-tests)
---
## Overview
[`log-surgeon`](https://github.com/y-scope/log-surgeon), is a high-performance C++ library that
enables efficient extraction of structured information from unstructured log files.
### Why `log-surgeon`?
Traditional regex engines are often slow to execute, prone to errors, and costly to maintain. For
example, Meta uses RE2 (a state-of-the-art regex engine) to parse logs, but they still face
scalability and maintenance challenges, which limits extraction to a small set of fields such as
timestamps, levels, and component names.
`log-surgeon` streamlines the process by identifying, extracting, and labeling variable values with
semantic context, and then inferring a log template in a single pass. `log-surgeon` is also built to
accommodate structural variability. Values may shift position, appear multiple times, or change order
entirely, but with `log-surgeon`, you simply define the variable patterns, and `log-surgeon`
JIT-compiles a tagged-DFA state machine to drive the full pipeline.
### Key capabilities
* **Extract variables** from log messages using regex patterns with named capture groups
* **Generate log types** (templates) automatically for log analysis
* **Parse streams** efficiently for large-scale log processing
* **Export data** to pandas DataFrames and PyArrow Tables
### Structured output and downstream capabilities
Unstructured log data is automatically transformed into structured semantic representations.
* **Log types (templates)**: Variables are replaced with placeholders to form reusable templates.
For example, roughly 200,000 Spark log messages can reduce to about 55 distinct templates, which
supports pattern analysis and anomaly detection.
* **Semantic Variables**: Extracted key-value pairs with semantic context (e.g., `app_id`,
`app_name`, `worker_id`) can be used directly for analysis.
This structured output unlocks powerful downstream capabilities:
* **Knowledge graph construction.** Build relationship graphs between entities extracted from logs
(e.g., linking `app_id` → `app_name` → `worker_id`). The structured output fits tools such as
[Stitch](https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhao), which
uses flow reconstruction from logs to perform non-intrusive performance profiling and debugging
across distributed systems.
* **Template-based summarization.** Compress massive datasets into compact template sets for human
and agent consumption. Templates act as natural tokens for LLMs. Instead of millions of raw lines,
provide a small number of distinct templates with statistics.
* **Hybrid search** Combine free-text search with structured queries. Log types enable
auto-completion and query suggestions on large datasets. Instead of searching through millions of
raw log lines, search across a compact set of templates first. Then project and filter on
structured variables (e.g., `status == "ERROR"`, `response_time > 1000`), and aggregate for
analysis.
* **Agentic automation.** Agents can query by template, analyze variable distributions, identify
anomalies, and automate debugging tasks using structured signals rather than raw text.
### When to use `log-surgeon`
**Good fit**
* Large-scale log processing (millions of lines)
* Extracting structured data from semi-structured logs
* Generating log templates for analytics
* Multi-line log events (stack traces, JSON dumps)
* Performance-critical parsing
**Not ideal**
* Simple one-off text extraction (use Python `re` module)
* Highly irregular text without consistent delimiters
* Patterns requiring full PCRE features (lookahead, backreferences)
---
## Getting started
Follow the instructions below to get started with `log-surgeon-ffi`.
### System requirements
- Python >= 3.9
- pandas
- pyarrow
#### Build requirements
- C++20 compatible compiler
- CMake >= 3.15
### Installation
To install the library with pandas and PyArrow support for DataFrame/Arrow table exports, run the
following command:
```bash
pip install log-surgeon-ffi
```
To verify your installation, run the following command:
```bash
python -c "from log_surgeon import Parser; print('Installation successful.')"
```
**Note:** If you only need core parsing without DataFrame or Arrow exports, you can install a
minimal environment, although pandas and PyArrow are included by default for convenience.
### First steps
After installation, follow these steps:
1. **Read [Key Concepts](#key-concepts).** Token based parsing differs from traditional regex.
2. **Run a [Quick start example](#quick-start-examples)** to see how it works.
3. **Use `rf"..."` for patterns** to avoid escaping issues. See
[Using Raw f-strings](#using-raw-f-strings-for-regex-patterns).
4. **Check out [examples/](examples/)** to study some complete working examples.
---
> ### Important prerequisites
>
> `log-surgeon` uses token-based parsing, and its regex behavior differs from traditional engines.
> Read the [Key Concepts](#key-concepts) section before writing patterns.
>
> Critical differences between token-based parsing and traditional regex behavior:
>
> * `.*` only matches within a single token (not across delimiters)
> * `abc|def` requires grouping: use `(abc)|(def)` instead
> * Use `{0,1}` for optional patterns, NOT `?`
>
> **Tip:** Use raw f-strings (`rf"..."`) for regex patterns. See
> [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for more details.
---
### Quick start examples
Use the following examples to get started.
#### Basic parsing
The following code parses a simple log event with `log-surgeon`.
```python
from log_surgeon import Parser, PATTERN
# Parse a sample log event
log_line = "16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\n"
# Create a parser and define extraction patterns
parser = Parser()
parser.add_var("resource", rf"(?<memory_gb>{PATTERN.FLOAT}) GiB ram")
parser.compile()
# Parse a single event
event = parser.parse_event(log_line)
# Access extracted data
print(f"Message: {event.get_log_message().strip()}")
print(f"LogType: {event.get_log_type().strip()}")
print(f"Parsed Logs: {event}")
```
**Output:**
```
Message: 16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram
LogType: 16/05/04 04:24:58 INFO Registering worker with 1 core and <memory_gb> GiB ram
Parsed Logs: {
"memory_gb": "4.0"
}
```
We can see that the parser extracted structured data from the unstructured log line:
* ***Message**: The original log line
* **LogType**: Template with variable placeholder `<memory_gb>` showing the pattern structure
* **Parsed variables**: Successfully extracted `memory_gb` value of "4.0" from the pattern match
#### Try it yourself
Copy this code and modify the pattern to extract both `memory_gb` AND `cores`:
```python
from log_surgeon import Parser, PATTERN
log_line = "16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\n"
parser = Parser()
# TODO: Add pattern to capture both "1" (cores) and "4.0" (memory_gb)
parser.add_var("resource", rf"...")
parser.compile()
event = parser.parse_event(log_line)
print(f"Cores: {event['cores']}, Memory: {event['memory_gb']}")
```
<details>
<summary>Solution</summary>
```python
parser.add_var("resource", rf"(?<cores>\d+) core and (?<memory_gb>{PATTERN.FLOAT}) GiB ram")
```
</details>
---
#### Multiple capture groups
The following code parses a more-complex log event.
```python
from log_surgeon import Parser, PATTERN
# Parse a sample log event
log_line = """16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:750)
"""
# Create a parser and define extraction patterns
parser = Parser()
# Add timestamp pattern
parser.add_timestamp("TIMESTAMP_SPARK_1_6", rf"\d{{2}}/\d{{2}}/\d{{2}} \d{{2}}:\d{{2}}:\d{{2}}")
# Add variable patterns
parser.add_var("SYSTEM_LEVEL", rf"(?<level>(INFO)|(WARN)|(ERROR))")
parser.add_var("SPARK_HOST_IP_PORT", rf"(?<spark_host>spark\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})")
parser.add_var(
"SYSTEM_EXCEPTION",
rf"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): "
rf"(?<system_exception_msg>{PATTERN.LOG_LINE})"
)
parser.add_var(
rf"SYSTEM_STACK_TRACE",
rf"(\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})"
)
parser.compile()
# Parse a single event
event = parser.parse_event(log_line)
# Access extracted data
print(f"Message: {event.get_log_message().strip()}")
print(f"LogType: {event.get_log_type().strip()}")
print(f"Parsed Logs: {event}")
```
**Output:**
```
Message: 16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:750)
LogType: <timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>
<system_exception_type>: <system_exception_msg><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine>
Parsed Logs: {
"timestamp": "16/05/04 12:22:37",
"level": "WARN",
"spark_host": "spark-35",
"system_ip": "192.168.10.50",
"system_port": "55392",
"system_exception_type": "java.io.IOException",
"system_exception_msg": "Connection reset by peer",
"system_stack": [
"sun.nio.ch.FileDispatcherImpl.read0(Native Method)",
"sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)",
"sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)",
"sun.nio.ch.IOUtil.read(IOUtil.java:192)",
"sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)",
"io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)",
"io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)",
"io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)",
"io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)",
"io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)",
"io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)",
"io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)",
"io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)",
"io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)",
"java.lang.Thread.run(Thread.java:750)"
]
}
```
The parser extracted **multiple named capture groups** from a complex multi-line Java stack trace:
* **Scalar fields**: `timestamp`, `level`, `spark_host`, `system_ip`, `system_port`,
`system_exception_type`, `system_exception_msg`
* **Array field**: `system_stack` contains all 15 stack trace locations (demonstrates automatic
aggregation of repeated capture groups)
* **LogType**: Template shows the structure with `<newLine>` markers indicating line boundaries in
the original log
---
#### Stream parsing
When parsing log streams or files, timestamps are **required** to perform contextual anchoring.
Timestamps act as delimiters that separate individual log events, enabling the parser to correctly
group multi-line entries (like stack traces) into single events.
```python
from log_surgeon import Parser, PATTERN
# Parse from string (automatically converted to io.StringIO)
SAMPLE_LOGS = """16/05/04 04:31:13 INFO master.Master: Registering app SparkSQL::192.168.10.76
16/05/04 12:32:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:750)
16/05/04 04:37:53 INFO master.Master: 192.168.10.76:41747 got disassociated, removing it.
"""
# Define parser with patterns
parser = Parser()
# REQUIRED: Timestamp acts as contextual anchor to separate individual log events in the stream
parser.add_timestamp("TIMESTAMP_SPARK_1_6", rf"\d{{2}}/\d{{2}}/\d{{2}} \d{{2}}:\d{{2}}:\d{{2}}")
parser.add_var("SYSTEM_LEVEL", rf"(?<level>(INFO)|(WARN)|(ERROR))")
parser.add_var("SPARK_APP_NAME", rf"(?<spark_app_name>SparkSQL::{PATTERN.IPV4})")
parser.add_var("SPARK_HOST_IP_PORT", rf"(?<spark_host>spark\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})")
parser.add_var(
"SYSTEM_EXCEPTION",
rf"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): "
rf"(?<system_exception_msg>{PATTERN.LOG_LINE})"
)
parser.add_var(
rf"SYSTEM_STACK_TRACE", rf"(\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})"
)
parser.add_var("IP_PORT", rf"(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})")
parser.compile()
# Stream parsing: iterate over multi-line log events
for idx, event in enumerate(parser.parse(SAMPLE_LOGS)):
print(f"log-event-{idx} log template type:{event.get_log_type().strip()}")
```
**Output:**
```
log-event-0 log template type:<timestamp> <level> master.Master: Registering app <spark_app_name>
log-event-1 log template type:<timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>
<system_exception_type>: <system_exception_msg><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack>
log-event-2 log template type:<timestamp> <level> master.Master: <system_ip>:<system_port> got disassociated, removing it.<newLine>
```
The parser successfully separated the log stream into **three distinct events** using timestamps as
contextual anchors:
* **Event 0**: Single-line app registration log
* **Event 1**: Multi-line exception with 15 stack trace lines (demonstrates how timestamps bind
multi-line events together)
* **Event 2**: Single-line disassociation log
Each log type shows the template structure with variable placeholders (`<level>`, `<system_ip>`,
etc.), enabling pattern-based log analysis and grouping.
---
#### Using `PATTERN` constants
The `PATTERN` class provides pre-built regex patterns for common log elements like IP addresses,
UUIDs, numbers, and file paths. See the [PATTERN reference](#pattern) for the complete list of
available patterns.
```python
from log_surgeon import Parser, PATTERN
parser = Parser()
parser.add_var("network", rf"IP: (?<ip>{PATTERN.IPV4}) UUID: (?<id>{PATTERN.UUID})")
parser.add_var("metrics", rf"value=(?<value>{PATTERN.FLOAT})")
parser.compile()
log_line = "IP: 192.168.1.1 UUID: 550e8400-e29b-41d4-a716-446655440000 value=42.5"
event = parser.parse_event(log_line)
print(f"IP: {event['ip']}")
print(f"UUID: {event['id']}")
print(f"Value: {event['value']}")
```
**Output:**
```
IP: 192.168.1.1
UUID: 550e8400-e29b-41d4-a716-446655440000
Value: 42.5
```
---
#### Export to DataFrame
```python
from log_surgeon import Parser, Query
parser = Parser()
parser.add_var(
"metric",
rf"metric=(?<metric_name>\w+) value=(?<value>\d+)"
)
parser.compile()
log_data = """
2024-01-01 INFO: metric=cpu value=42
2024-01-01 INFO: metric=memory value=100
2024-01-01 INFO: metric=disk value=7
"""
# Create a query and export to DataFrame
query = (
Query(parser)
.select(["metric_name", "value"])
.from_(log_data)
.validate_query()
)
df = query.to_dataframe()
print(df)
```
---
#### Filtering events
```python
from log_surgeon import Parser, Query
parser = Parser()
parser.add_var("metric", rf"metric=(?<metric_name>\w+) value=(?<value>\d+)")
parser.compile()
log_data = """
2024-01-01 INFO: metric=cpu value=42
2024-01-01 INFO: metric=memory value=100
2024-01-01 INFO: metric=disk value=7
2024-01-01 INFO: metric=cpu value=85
"""
# Filter events where value > 50
query = (
Query(parser)
.select(["metric_name", "value"])
.from_(log_data)
.filter(lambda event: int(event['value']) > 50)
.validate_query()
)
df = query.to_dataframe()
print(df)
# Output:
# metric_name value
# 0 memory 100
# 1 cpu 85
```
---
#### Including log template type and log message
Use special fields `@log_type` and `@log_message` to include alongside extracted variables:
```python
from log_surgeon import Parser, Query
parser = Parser()
parser.add_var("metric", rf"value=(?<value>\d+)")
parser.compile()
log_data = """
2024-01-01 INFO: Processing value=42
2024-01-01 WARN: Processing value=100
"""
# Select log type, message, and all variables
query = (
Query(parser)
.select(["@log_type", "@log_message", "*"])
.from_(log_data)
.validate_query()
)
df = query.to_dataframe()
print(df)
# Output:
# @log_type @log_message value
# 0 <timestamp> INFO: Processing <metric> 2024-01-01 INFO: Processing value=42 42
# 1 <timestamp> WARN: Processing <metric> 2024-01-01 WARN: Processing value=100 100
```
The `"*"` wildcard expands to all variables defined in the schema and can be combined with other fields like `@log_type` and `@log_message`.
---
#### Analyzing Log Types
Discover and analyze log patterns in your data using log type analysis methods:
```python
from log_surgeon import Parser, Query
parser = Parser()
parser.add_var("metric", rf"value=(?<value>\d+)")
parser.add_var("status", rf"status=(?<status>\w+)")
parser.compile()
log_data = """
2024-01-01 INFO: Processing value=42
2024-01-01 INFO: Processing value=100
2024-01-01 WARN: System status=degraded
2024-01-01 INFO: Processing value=7
2024-01-01 ERROR: System status=failed
"""
query = Query(parser).from_(log_data)
# Get all unique log types
print("Unique log types:")
for log_type in query.get_log_types():
print(f" {log_type}")
# Reset stream for next analysis
query.from_(log_data)
# Get log type occurrence counts
print("\nLog type counts:")
counts = query.get_log_type_counts()
for log_type, count in sorted(counts.items(), key=lambda x: -x[1]):
print(f" {count:3d} {log_type}")
# Reset stream for next analysis
query.from_(log_data)
# Get sample messages for each log type
print("\nLog type samples:")
samples = query.get_log_type_with_sample(sample_size=2)
for log_type, messages in samples.items():
print(f" {log_type}")
for msg in messages:
print(f" - {msg.strip()}")
```
**Output:**
```
Unique log types:
<timestamp> INFO: Processing <metric>
<timestamp> WARN: System <status>
<timestamp> ERROR: System <status>
Log type counts:
3 <timestamp> INFO: Processing <metric>
1 <timestamp> WARN: System <status>
1 <timestamp> ERROR: System <status>
Log type samples:
<timestamp> INFO: Processing <metric>
- 2024-01-01 INFO: Processing value=42
- 2024-01-01 INFO: Processing value=100
<timestamp> WARN: System <status>
- 2024-01-01 WARN: System status=degraded
<timestamp> ERROR: System <status>
- 2024-01-01 ERROR: System status=failed
```
---
## Key concepts
> **CRITICAL: You must understand these concepts to use `log-surgeon` correctly.**
>
> `log-surgeon` works **fundamentally differently** from traditional regex engines like Python's
> `re` module, PCRE, or JavaScript regex. Skipping this section may lead to patterns that don't
> work as expected.
### Token-based parsing and delimiters
**CRITICAL:** `log-surgeon` uses **token-based** parsing, not character-based regex matching like
traditional regex engines. This is the most important difference that affects how patterns work.
#### How tokenization works
Delimiters are characters used to split log messages into tokens. The default delimiters include:
- Whitespace: space, tab (`\t`), newline (`\n`), carriage return (`\r`)
- Punctuation: `:`, `,`, `!`, `;`, `%`, `@`, `/`, `(`, `)`, `[`, `]`
For example, with default delimiters, the log message:
```
"abc def ghi"
```
is tokenized into three tokens: `["abc", "def", "ghi"]`
You can customize delimiters when creating a Parser:
```python
parser = Parser(delimiters=r" \t\n,:") # Custom delimiters
```
#### Token-Based Pattern Matching
**Critical:** Patterns like `.*` only match **within a single token**, not across multiple tokens or delimiters.
```python
from log_surgeon import Parser
parser = Parser() # Default delimiters include space
parser.add_var("token", rf"(?<match>d.*)")
parser.compile()
# With "abc def ghi" tokenized as ["abc", "def", "ghi"]
event = parser.parse_event("abc def ghi")
# Matches only "def" (single token starting with 'd')
# Does NOT match "def ghi" (would cross token boundary)
print(event['match']) # Output: "def"
```
**In a traditional regex engine**, `d.*` would match `"def ghi"` (everything from 'd' to end).
**In log-surgeon**, `d.*` matches only `"def"` because patterns cannot cross delimiter boundaries.
#### Why token-based?
Token-based parsing enables:
- **Faster parsing** by reducing search space
- **Predictable behavior** aligned with log structure
- **Efficient log type generation** for analytics
#### Working with token boundaries
To match across multiple tokens, you must use **character classes** like `[a-zA-Z]*` instead of `.`:
```python
from log_surgeon import Parser
parser = Parser() # Default delimiters include space
# Using .* - only matches within a single token
parser.add_var("wrong", rf"(?<match>d.*)") # Matches only "def"
# Using character classes - matches across tokens
parser.add_var("correct", rf"(?<match>d[a-z ]*i)") # Matches "def ghi"
parser.compile()
event = parser.parse_event("abc def ghi")
print(event['match']) # Output: "def ghi"
```
**Key Rule:** Character classes like `[a-zA-Z]*`, `[a-z ]*`, or `[\w\s]*` can match across token
boundaries, but `.*` cannot.
#### Alternation requires grouping
**CRITICAL:** Alternation (`|`) works differently in log-surgeon compared to traditional regex
engines. You **must** use parentheses to group alternatives.
```python
from log_surgeon import Parser
parser = Parser()
# WRONG: Without grouping - matches "ab" AND ("c" OR "d") AND "ef"
parser.add_var("wrong", rf"(?<word>abc|def)")
# In log-surgeon, this is interpreted as: "ab" + "c|d" + "ef"
# Matches: "abcef" or "abdef" (NOT "abc" or "def")
# CORRECT: With grouping - matches "abc" OR "def"
parser.add_var("correct", rf"(?<word>(abc)|(def))")
# Matches: "abc" or "def"
parser.compile()
```
**In traditional regex engines**, `abc|def` means "abc" OR "def".
**In log-surgeon**, `abc|def` means "ab" + ("c" OR "d") + "ef".
**Key Rule:** Always use `(abc)|(def)` syntax for alternation to match complete alternatives.
```python
# More examples:
parser.add_var("level", rf"(?<level>(ERROR)|(WARN)|(INFO))") # Correct
parser.add_var("status", rf"(?<status>(success)|(failure))") # Correct
parser.add_var("bad", rf"(?<status>success|failure)") # Wrong - unexpected behavior
```
#### Optional patterns
For optional patterns, use `{0,1}` instead of `*`:
```python
from log_surgeon import Parser
parser = Parser()
# Avoid using * for optional patterns (matches 0 or more)
parser.add_var("avoid", rf"(?<level>(ERROR)|(WARN))*") # Can match empty string or multiple reps
# Do not use ? for optional patterns
parser.add_var("avoid2", rf"(?<level>(ERROR)|(WARN))?") # May not work as expected
# Use {0,1} for optional patterns (matches 0 or 1)
parser.add_var("optional", rf"(?<level>(ERROR)|(WARN)){0,1}") # Matches 0 or 1 occurrence
parser.compile()
```
**Best practice:** Use `{0,1}` for optional elements. Avoid `*` (0 or more) and `?` for optional
matching.
You can also explicitly include delimiters in your pattern:
```python
# To match "def ghi", explicitly include the space delimiter
parser.add_var("multi", rf"(?<match>d\w+\s+\w+)")
# This matches "def " as one token segment, followed by "ghi"
```
Or adjust your delimiters to change tokenization behavior:
```python
# Use only newline as delimiter to treat entire lines as tokens
parser = Parser(delimiters=r"\n")
```
### Named capture groups
Use named capture groups in regex patterns to extract specific fields:
```python
parser.add_var("metric", rf"metric=(?<metric_name>\w+) value=(?<value>\d+)")
```
The syntax `(?<name>pattern)` creates a capture group that can be accessed as `event['name']`.
**Note:** See [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for best practices on
writing regex patterns.
### Using raw f-strings for regex patterns
> **⚠️ STRONGLY RECOMMENDED: Use raw f-strings (`rf"..."`) for all regex patterns.**
>
> While not absolutely required, using regular strings will likely cause escaping issues and pattern
failures. Raw f-strings prevent these problems.
Raw f-strings combine the benefits of:
- **Raw strings (`r"..."`)**: No need to double-escape regex special characters like `\d`, `\w`,
`\n`
- **f-strings (`f"..."`)**: Easy interpolation of variables and pattern constants
#### Why use raw f-strings?
```python
# Without raw strings - requires double-escaping
parser.add_var("metric", "value=(\\d+)") # Hard to read, error-prone
# With raw f-strings - single escaping, clean and readable
parser.add_var("metric", rf"value=(?<value>\d+)")
```
#### Watch out for braces in f-strings
When using f-strings, literal `{` and `}` characters must be escaped by doubling them:
```python
from log_surgeon import Parser, Pattern
parser = Parser()
# Correct: Escape literal braces in regex
parser.add_var("json", rf"data={{(?<content>[^}}]+)}}") # Matches: data={...}
parser.add_var("range", rf"range={{(?<min>\d+),(?<max>\d+)}}") # Matches: range={10,20}
# Using Pattern constants with interpolation
parser.add_var("ip", rf"IP: (?<ip>{Pattern.IPV4})")
parser.add_var("float", rf"value=(?<val>{Pattern.FLOAT})")
# Common regex patterns
parser.add_var("digits", rf"\d+ items") # No double-escaping needed
parser.add_var("word", rf"name=(?<name>\w+)")
parser.add_var("whitespace", rf"split\s+by\s+spaces")
parser.compile()
```
#### Examples: raw f-strings vs regular strings
```python
# Regular string - requires double-escaping
parser.add_var("path", "path=(?<path>\\w+/\\w+)") # Hard to read
# Raw f-string - natural regex syntax
parser.add_var("path", rf"path=(?<path>\w+/\w+)") # Clean and readable
# With interpolation
log_level = "INFO|WARN|ERROR"
parser.add_var("level", rf"(?<level>{log_level})") # Easy to compose
```
**Recommendation:** Consistently use `rf"..."` for all regex patterns. This approach:
- Avoids double-escaping mistakes that break patterns
- Makes patterns more readable
- Allows easy use of Pattern constants and variables
- Only requires watching for literal braces `{` and `}` in f-strings (escape as `{{` and `}}`)
Using regular strings (`"..."`) will require double-escaping (e.g., `"\\d+"`) which is error-prone
and can be hard to read.
### Logical vs. physical names
Internally, log-surgeon uses "physical" names (e.g., `CGPrefix0`, `CGPrefix1`) for capture groups,
while you work with "logical" names (e.g., `user_id`, `thread`). The `GroupNameResolver` handles
this mapping automatically.
### Schema Format
The schema defines delimiters, timestamps, and variables for parsing:
```
// schema delimiters
delimiters: \t\r\n:,!;%@/\(\)\[\]
// schema timestamps
timestamp:<timestamp_regex>
// schema variables
variable_name:<variable_regex>
```
When using the fluent API (`Parser.add_var()` and `Parser.compile()`), the schema is built automatically.
### Common Pitfalls
**Pattern doesn't match anything**
- Check: Are you using `.*` to match across tokens? Use `[a-zA-Z ]*` instead
- Check: Did you forget to call `parser.compile()`?
- Check: Are your delimiters splitting tokens unexpectedly?
**Alternation not working (abc|def)**
- Problem: `(?<name>abc|def)` doesn't match "abc" or "def" as expected
- Solution: Use `(?<name>(abc)|(def))` with explicit grouping
**Pattern works in regex tester but not here**
- Remember: log-surgeon is token-based, not character-based
- Traditional regex engines match across entire strings
- log-surgeon matches within token boundaries (delimited by spaces, colons, etc.)
- Read: [Token-Based Parsing](#token-based-parsing-and-delimiters)
**Escape sequence errors in Python**
- Problem: `parser.add_var("digits", "(?<num>\d+)")` raises SyntaxError
- Solution: Use `rf"..."` (raw f-string) instead of `"..."` or `f"..."`
- Example: `parser.add_var("digits", rf"(?<num>\d+)")`
**Optional pattern matching incorrectly**
- Problem: Using `?` or `*` for optional patterns
- Solution: Use `{0,1}` for optional elements
- Example: `(?<level>(ERROR)|(WARN)){0,1}` for optional log level
---
## Reference
| Task | Syntax |
|------|--------|
| Named capture | `(?<name>pattern)` |
| Alternation | `(?<name>(opt1)|(opt2))` (NOT `opt1|opt2`) |
| Optional | `{0,1}` (NOT `?` or `*`) |
| Match across tokens | Use `[a-z ]*` (NOT `.*`) |
| Pattern string | `rf"..."` (raw f-string recommended) |
| All variables | `.select(["*"])` |
| Log type | `.select(["@log_type"])` |
| Original message | `.select(["@log_message"])` |
### Parser
High-level parser for extracting structured data from unstructured log messages.
#### Constructor
- `Parser(delimiters: str = r" \t\r\n:,!;%@/\(\)\[\]")`
- Initialize a parser with optional custom delimiters
- Default delimiters include space, tab, newline, and common punctuation
#### Methods
- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> Parser`
- Add a variable pattern to the parser's schema
- Supports named capture groups using `(?<name>)` syntax
- Use raw f-strings (`rf"..."`) for regex patterns (see [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns))
- Returns self for method chaining
- `add_timestamp(name: str, regex: str) -> Parser`
- Add a timestamp pattern to the parser's schema
- Returns self for method chaining
- `compile(enable_debug_logs: bool = False) -> None`
- Build and initialize the parser with the configured schema
- Must be called after adding variables and before parsing
- Set `enable_debug_logs=True` to output debug information to stderr
- `load_schema(schema: str, group_name_resolver: GroupNameResolver) -> None`
- Load a pre-built schema string to configure the parser
- `parse(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Generator[LogEvent, None, None]`
- Parse all log events from a string, file object, or stream
- Accepts strings, text/binary file objects, StringIO, or BytesIO
- Yields LogEvent objects for each parsed event
- `parse_event(payload: str) -> LogEvent | None`
- Parse a single log event from a string (convenience method)
- Wraps `parse()` and returns the first event
- Returns LogEvent or None if no event found
### LogEvent
Represents a parsed log event with extracted variables.
#### Methods
- `get_log_message() -> str`
- Get the original log message
- `get_log_type() -> str`
- Get the generated log type (template) with logical group names
- `get_capture_group(logical_capture_group_name: str, raw_output: bool = False) -> str | list | None`
- Get the value of a capture group by its logical name
- If `raw_output=False` (default), single values are unwrapped from lists
- Returns None if capture group not found
- `get_capture_group_str_representation(field: str, raw_output: bool = False) -> str`
- Get the string representation of a capture group value
- `get_resolved_dict() -> dict[str, str | list]`
- Get a dictionary with all capture groups using logical (user-defined) names
- Physical names (CGPrefix*) are converted to logical names
- Timestamp fields are consolidated under "timestamp" key
- Single-value lists are unwrapped to scalar values
- "@LogType" is excluded from the output
- `__getitem__(key: str) -> str | list`
- Access capture group values by name (e.g., `event['field_name']`)
- Shorthand for `get_capture_group(key, raw_output=False)`
- `__str__() -> str`
- Get formatted JSON representation of the log event with logical group names
- Uses `get_resolved_dict()` internally
### Query
Query builder for parsing log events into structured data formats.
#### Constructor
- `Query(parser: Parser)`
- Initialize a query with a configured parser
#### Methods
- `select(fields: list[str]) -> Query`
- Select fields to extract from log events
- Supports variable names, `"*"` for all variables, `"@log_type"` for log type, and `"@log_message"` for original message
- The `"*"` wildcard can be combined with other fields (e.g., `["@log_type", "*"]`)
- Returns self for method chaining
- `filter(predicate: Callable[[LogEvent], bool]) -> Query`
- Filter log events using a predicate function
- Predicate receives a LogEvent and returns True to include it, False to exclude
- Returns self for method chaining
- Example: `query.filter(lambda event: int(event['value']) > 50)`
- `from_(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`
- Set the input source to parse
- Accepts strings, text/binary file objects, StringIO, or BytesIO
- Strings are automatically wrapped in StringIO
- Returns self for method chaining
- `select_from(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`
- Alias for `from_()`
- Returns self for method chaining
- `validate_query() -> Query`
- Validate that the query is properly configured
- Returns self for method chaining
- `to_dataframe() -> pd.DataFrame`
- Convert parsed events to a pandas DataFrame
- `to_df() -> pd.DataFrame`
- Alias for `to_dataframe()`
- `to_arrow() -> pa.Table`
- Convert parsed events to a PyArrow Table
- `to_pa() -> pa.Table`
- Alias for `to_arrow()`
- `get_rows() -> list[list]`
- Extract rows of field values from parsed events
- `get_vars() -> KeysView[str]`
- Get all variable names (logical capture group names) defined in the schema
- `get_log_types() -> Generator[str, None, None]`
- Get all unique log types from parsed events
- Yields log types in the order they are first encountered
- Useful for discovering log patterns in your data
- `get_log_type_counts() -> dict[str, int]`
- Get count of occurrences for each unique log type
- Returns dictionary mapping log types to their counts
- Useful for analyzing log type distribution
- `get_log_type_with_sample(sample_size: int = 3) -> dict[str, list[str]]`
- Get sample log messages for each unique log type
- Returns dictionary mapping log types to lists of sample messages
- Useful for understanding what actual messages match each template
### SchemaCompiler
Compiler for constructing log-surgeon schema definitions.
#### Constructor
- `SchemaCompiler(delimiters: str = DEFAULT_DELIMITERS)`
- Initialize a schema compiler with optional custom delimiters
#### Methods
- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> SchemaCompiler`
- Add a variable pattern to the schema
- Returns self for method chaining
- `add_timestamp(name: str, regex: str) -> SchemaCompiler`
- Add a timestamp pattern to the schema
- Returns self for method chaining
- `remove_var(var_name: str) -> SchemaCompiler`
- Remove a variable from the schema
- Returns self for method chaining
- `get_var(var_name: str) -> Variable`
- Get a variable by name
- `compile() -> str`
- Compile the final schema string
- `get_capture_group_name_resolver() -> GroupNameResolver`
- Get the resolver for mapping logical to physical capture group names
### GroupNameResolver
Bidirectional mapping between logical (user-defined) and physical (auto-generated) group names.
#### Constructor
- `GroupNameResolver(physical_name_prefix: str)`
- Initialize with a prefix for auto-generated physical names
#### Methods
- `create_new_physical_name(logical_name: str) -> str`
- Create a new unique physical name for a logical name
- Each call generates a new physical name
- `get_physical_names(logical_name: str) -> set[str]`
- Get all physical names associated with a logical name
- `get_logical_name(physical_name: str) -> str`
- Get the logical name for a physical name
- `get_all_logical_names() -> KeysView[str]`
- Get all logical names that have been registered
### PATTERN
Collection of pre-built regex patterns optimized for log parsing. These patterns follow log-surgeon's syntax requirements and are ready to use with named capture groups.
#### Available Patterns
**Network Patterns**
| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.UUID` | UUID (Universally Unique Identifier) | `550e8400-e29b-41d4-a716-446655440000` |
| `PATTERN.IP_OCTET` | Single IPv4 octet (0-255) | `192`, `10`, `255` |
| `PATTERN.IPV4` | IPv4 address | `192.168.1.1`, `10.0.0.1` |
| `PATTERN.PORT` | Network port number (1-5 digits) | `80`, `8080`, `65535` |
**Numeric Patterns**
| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.INT` | Integer with optional negative sign | `42`, `-123`, `0` |
| `PATTERN.FLOAT` | Float with optional negative sign | `3.14`, `-123.456`, `0.5` |
**File System Patterns**
| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.LINUX_FILE_NAME_CHARSET` | Character set for Linux file names | `a-zA-Z0-9 ._-` |
| `PATTERN.LINUX_FILE_NAME` | Linux file name | `app.log`, `config-2024.yaml` |
| `PATTERN.LINUX_FILE_PATH` | Linux file path (relative) | `logs/app.log`, `var/log/system.log` |
**Character Sets and Word Patterns**
| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.JAVA_IDENTIFIER_CHARSET` | Java identifier character set | `a-zA-Z0-9_` |
| `PATTERN.JAVA_IDENTIFIER` | Java identifier | `myVariable`, `$value`, `Test123` |
| `PATTERN.LOG_LINE_CHARSET` | Common log line characters | Alphanumeric + symbols + whitespace |
| `PATTERN.LOG_LINE` | General log line content | `Error: connection timeout` |
| `PATTERN.LOG_LINE_NO_WHITE_SPACE_CHARSET` | Log line chars without whitespace | Alphanumeric + symbols only |
| `PATTERN.LOG_LINE_NO_WHITE_SPACE` | Log content without spaces | `ERROR`, `/var/log/app.log` |
**Java-Specific Patterns**
| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.JAVA_LITERAL_CHARSET` | Java literal character set | `a-zA-Z0-9_$` |
| `PATTERN.JAVA_PACKAGE_SEGMENT` | Single Java package segment | `com.`, `example.` |
| `PATTERN.JAVA_CLASS_NAME` | Java class name | `MyClass`, `ArrayList` |
| `PATTERN.JAVA_FULLY_QUALIFIED_CLASS_NAME` | Fully qualified class name | `java.util.ArrayList` |
| `PATTERN.JAVA_LOGGING_CODE_LOCATION_HINT` | Java logging location hint | `~[MyClass.java:42?]` |
| `PATTERN.JAVA_STACK_LOCATION` | Java stack trace location | `java.util.ArrayList.add(ArrayList.java:123)` |
#### Example usage
```python
from log_surgeon import Parser, PATTERN
parser = Parser()
# Network patterns
parser.add_var("network", rf"IP: (?<ip>{PATTERN.IPV4}) Port: (?<port>{PATTERN.PORT})")
# Numeric patterns
parser.add_var("metrics", rf"value=(?<value>{PATTERN.FLOAT}) count=(?<count>{PATTERN.INT})")
# File system patterns
parser.add_var("file", rf"Opening (?<filepath>{PATTERN.LINUX_FILE_PATH})")
# Java patterns
parser.add_var("exception", rf"at (?<stack>{PATTERN.JAVA_STACK_LOCATION})")
parser.compile()
```
#### Composing Patterns
PATTERN constants can be composed to build more complex patterns:
```python
from log_surgeon import Parser, PATTERN
parser = Parser()
# Combine multiple patterns
parser.add_var(
"server_info",
rf"Server (?<name>{PATTERN.JAVA_IDENTIFIER}) at (?<ip>{PATTERN.IPV4}):(?<port>{PATTERN.PORT})"
)
# Use character sets to build custom patterns
parser.add_var(
"custom_id",
rf"ID-(?<id>[{PATTERN.JAVA_IDENTIFIER_CHARSET}]+)"
)
parser.compile()
```
---
## Development
### Building from source
```bash
# Clone the repository
git clone https://github.com/y-scope/log-surgeon-ffi-py.git
cd log-surgeon-ffi-py
# Install the project in editable mode
pip install -e .
# Build the extension
cmake -S . -B build
cmake --build build
```
### Running tests
```bash
# Install test dependencies
pip install pytest
# Run tests
python -m pytest tests/
```
---
## License
Apache License 2.0 - See [LICENSE](LICENSE) for details.
---
## Links
- [Homepage](https://github.com/y-scope/log-surgeon-ffi-py)
- [Bug Tracker](https://github.com/y-scope/log-surgeon-ffi-py/issues)
- [log-surgeon C++ library](https://github.com/y-scope/log-surgeon)
---
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Raw data
{
"_id": null,
"home_page": null,
"name": "log-surgeon-ffi",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "logging, log-parsing, log-analysis, structured-data, performance, observability",
"author": null,
"author_email": "y-scope <info@yscope.com>",
"download_url": null,
"platform": null,
"description": "# `log-surgeon-ffi`\n\n`log-surgeon-ffi` provides Python foreign function interface (FFI) bindings for\n[`log-surgeon`](https://github.com/y-scope/log-surgeon).\n\n---\n\n## Quick navigation\n\n[**Overview**](#overview)\n* [Why `log-surgeon`?](#why-log-surgeon)\n* [Key capabilities](#key-capabilities)\n* [Structured output and downstream capabilities](#structured-output-and-downstream-capabilities)\n* [When to use `log-surgeon`](#when-to-use-log-surgeon)\n\n[**Getting started**](#getting-started)\n* [System requirements](#system-requirements)\n* [Installation](#installation)\n* [First steps](#first-steps)\n* [Important prerequisites](#important-prerequisites)\n* [Quick start examples](#quick-start-examples)\n\n[**Key concepts**](#key-concepts)\n* [Token-based parsing and delimiters](#token-based-parsing-and-delimiters)\n* [Named capture groups](#named-capture-groups)\n* [Using raw f-strings for regex patterns](#using-raw-f-strings-for-regex-patterns)\n\n[**Reference**](#reference)\n* [Parser API](#parser)\n* [Query API](#query)\n* [PATTERN constants](#pattern)\n\n[**Development**](#development)\n* [Building from source](#building-from-source)\n* [Running tests](#running-tests)\n\n---\n\n## Overview\n\n[`log-surgeon`](https://github.com/y-scope/log-surgeon), is a high-performance C++ library that\nenables efficient extraction of structured information from unstructured log files.\n\n### Why `log-surgeon`?\n\nTraditional regex engines are often slow to execute, prone to errors, and costly to maintain. For\nexample, Meta uses RE2 (a state-of-the-art regex engine) to parse logs, but they still face\nscalability and maintenance challenges, which limits extraction to a small set of fields such as\ntimestamps, levels, and component names.\n\n`log-surgeon` streamlines the process by identifying, extracting, and labeling variable values with\nsemantic context, and then inferring a log template in a single pass. `log-surgeon` is also built to\naccommodate structural variability. Values may shift position, appear multiple times, or change order\nentirely, but with `log-surgeon`, you simply define the variable patterns, and `log-surgeon`\nJIT-compiles a tagged-DFA state machine to drive the full pipeline.\n\n### Key capabilities\n\n* **Extract variables** from log messages using regex patterns with named capture groups\n* **Generate log types** (templates) automatically for log analysis\n* **Parse streams** efficiently for large-scale log processing\n* **Export data** to pandas DataFrames and PyArrow Tables\n\n### Structured output and downstream capabilities\n\nUnstructured log data is automatically transformed into structured semantic representations.\n\n* **Log types (templates)**: Variables are replaced with placeholders to form reusable templates.\n For example, roughly 200,000 Spark log messages can reduce to about 55 distinct templates, which\n supports pattern analysis and anomaly detection.\n\n* **Semantic Variables**: Extracted key-value pairs with semantic context (e.g., `app_id`,\n `app_name`, `worker_id`) can be used directly for analysis.\n\nThis structured output unlocks powerful downstream capabilities:\n\n* **Knowledge graph construction.** Build relationship graphs between entities extracted from logs\n (e.g., linking `app_id` \u2192 `app_name` \u2192 `worker_id`). The structured output fits tools such as\n [Stitch](https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhao), which\n uses flow reconstruction from logs to perform non-intrusive performance profiling and debugging\n across distributed systems.\n\n* **Template-based summarization.** Compress massive datasets into compact template sets for human\n and agent consumption. Templates act as natural tokens for LLMs. Instead of millions of raw lines,\n provide a small number of distinct templates with statistics.\n\n* **Hybrid search** Combine free-text search with structured queries. Log types enable\n auto-completion and query suggestions on large datasets. Instead of searching through millions of\n raw log lines, search across a compact set of templates first. Then project and filter on\n structured variables (e.g., `status == \"ERROR\"`, `response_time > 1000`), and aggregate for\n analysis.\n\n* **Agentic automation.** Agents can query by template, analyze variable distributions, identify\n anomalies, and automate debugging tasks using structured signals rather than raw text.\n\n### When to use `log-surgeon`\n\n**Good fit**\n* Large-scale log processing (millions of lines)\n* Extracting structured data from semi-structured logs\n* Generating log templates for analytics\n* Multi-line log events (stack traces, JSON dumps)\n* Performance-critical parsing\n\n**Not ideal**\n* Simple one-off text extraction (use Python `re` module)\n* Highly irregular text without consistent delimiters\n* Patterns requiring full PCRE features (lookahead, backreferences)\n\n---\n\n## Getting started\n\nFollow the instructions below to get started with `log-surgeon-ffi`.\n\n### System requirements\n\n- Python >= 3.9\n- pandas\n- pyarrow\n\n#### Build requirements\n\n- C++20 compatible compiler\n- CMake >= 3.15\n\n### Installation\n\nTo install the library with pandas and PyArrow support for DataFrame/Arrow table exports, run the\nfollowing command:\n\n```bash\npip install log-surgeon-ffi\n```\n\nTo verify your installation, run the following command:\n\n```bash\npython -c \"from log_surgeon import Parser; print('Installation successful.')\"\n```\n\n**Note:** If you only need core parsing without DataFrame or Arrow exports, you can install a\nminimal environment, although pandas and PyArrow are included by default for convenience.\n\n### First steps\n\nAfter installation, follow these steps:\n\n1. **Read [Key Concepts](#key-concepts).** Token based parsing differs from traditional regex.\n2. **Run a [Quick start example](#quick-start-examples)** to see how it works.\n3. **Use `rf\"...\"` for patterns** to avoid escaping issues. See\n [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns).\n4. **Check out [examples/](examples/)** to study some complete working examples.\n\n---\n\n> ### Important prerequisites\n> \n> `log-surgeon` uses token-based parsing, and its regex behavior differs from traditional engines.\n> Read the [Key Concepts](#key-concepts) section before writing patterns.\n> \n> Critical differences between token-based parsing and traditional regex behavior:\n> \n> * `.*` only matches within a single token (not across delimiters)\n> * `abc|def` requires grouping: use `(abc)|(def)` instead\n> * Use `{0,1}` for optional patterns, NOT `?`\n> \n> **Tip:** Use raw f-strings (`rf\"...\"`) for regex patterns. See \n> [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for more details.\n\n---\n\n### Quick start examples\n\nUse the following examples to get started.\n\n#### Basic parsing\n\nThe following code parses a simple log event with `log-surgeon`.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\n# Parse a sample log event\nlog_line = \"16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\\n\"\n\n# Create a parser and define extraction patterns\nparser = Parser()\nparser.add_var(\"resource\", rf\"(?<memory_gb>{PATTERN.FLOAT}) GiB ram\")\nparser.compile()\n\n# Parse a single event\nevent = parser.parse_event(log_line)\n\n# Access extracted data\nprint(f\"Message: {event.get_log_message().strip()}\")\nprint(f\"LogType: {event.get_log_type().strip()}\")\nprint(f\"Parsed Logs: {event}\")\n```\n\n**Output:**\n```\nMessage: 16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\nLogType: 16/05/04 04:24:58 INFO Registering worker with 1 core and <memory_gb> GiB ram\nParsed Logs: {\n \"memory_gb\": \"4.0\"\n}\n```\n\nWe can see that the parser extracted structured data from the unstructured log line:\n* ***Message**: The original log line\n* **LogType**: Template with variable placeholder `<memory_gb>` showing the pattern structure\n* **Parsed variables**: Successfully extracted `memory_gb` value of \"4.0\" from the pattern match\n\n#### Try it yourself\n\nCopy this code and modify the pattern to extract both `memory_gb` AND `cores`:\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nlog_line = \"16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\\n\"\nparser = Parser()\n# TODO: Add pattern to capture both \"1\" (cores) and \"4.0\" (memory_gb)\nparser.add_var(\"resource\", rf\"...\")\nparser.compile()\n\nevent = parser.parse_event(log_line)\nprint(f\"Cores: {event['cores']}, Memory: {event['memory_gb']}\")\n```\n\n<details>\n<summary>Solution</summary>\n\n```python\nparser.add_var(\"resource\", rf\"(?<cores>\\d+) core and (?<memory_gb>{PATTERN.FLOAT}) GiB ram\")\n```\n</details>\n\n---\n\n#### Multiple capture groups\n\nThe following code parses a more-complex log event.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\n# Parse a sample log event\nlog_line = \"\"\"16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392\njava.io.IOException: Connection reset by peer\n at sun.nio.ch.FileDispatcherImpl.read0(Native Method)\n at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\n at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\n at sun.nio.ch.IOUtil.read(IOUtil.java:192)\n at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\n at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\n at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\n at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\n at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\n at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\n at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\n at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\n at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\n at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\n at java.lang.Thread.run(Thread.java:750)\n\"\"\"\n\n# Create a parser and define extraction patterns\nparser = Parser()\n\n# Add timestamp pattern\nparser.add_timestamp(\"TIMESTAMP_SPARK_1_6\", rf\"\\d{{2}}/\\d{{2}}/\\d{{2}} \\d{{2}}:\\d{{2}}:\\d{{2}}\")\n\n# Add variable patterns\nparser.add_var(\"SYSTEM_LEVEL\", rf\"(?<level>(INFO)|(WARN)|(ERROR))\")\nparser.add_var(\"SPARK_HOST_IP_PORT\", rf\"(?<spark_host>spark\\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})\")\nparser.add_var(\n \"SYSTEM_EXCEPTION\",\n rf\"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): \"\n rf\"(?<system_exception_msg>{PATTERN.LOG_LINE})\"\n)\nparser.add_var(\n rf\"SYSTEM_STACK_TRACE\",\n rf\"(\\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})\"\n)\nparser.compile()\n\n# Parse a single event\nevent = parser.parse_event(log_line)\n\n# Access extracted data\nprint(f\"Message: {event.get_log_message().strip()}\")\nprint(f\"LogType: {event.get_log_type().strip()}\")\nprint(f\"Parsed Logs: {event}\")\n```\n\n**Output:**\n```\nMessage: 16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392\njava.io.IOException: Connection reset by peer\n at sun.nio.ch.FileDispatcherImpl.read0(Native Method)\n at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\n at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\n at sun.nio.ch.IOUtil.read(IOUtil.java:192)\n at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\n at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\n at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\n at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\n at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\n at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\n at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\n at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\n at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\n at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\n at java.lang.Thread.run(Thread.java:750)\nLogType: <timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>\n<system_exception_type>: <system_exception_msg><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine>\nParsed Logs: {\n \"timestamp\": \"16/05/04 12:22:37\",\n \"level\": \"WARN\",\n \"spark_host\": \"spark-35\",\n \"system_ip\": \"192.168.10.50\",\n \"system_port\": \"55392\",\n \"system_exception_type\": \"java.io.IOException\",\n \"system_exception_msg\": \"Connection reset by peer\",\n \"system_stack\": [\n \"sun.nio.ch.FileDispatcherImpl.read0(Native Method)\",\n \"sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\",\n \"sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\",\n \"sun.nio.ch.IOUtil.read(IOUtil.java:192)\",\n \"sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\",\n \"io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\",\n \"io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\",\n \"io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\",\n \"io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\",\n \"io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\",\n \"io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\",\n \"io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\",\n \"io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\",\n \"io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\",\n \"java.lang.Thread.run(Thread.java:750)\"\n ]\n}\n```\n\nThe parser extracted **multiple named capture groups** from a complex multi-line Java stack trace:\n* **Scalar fields**: `timestamp`, `level`, `spark_host`, `system_ip`, `system_port`,\n `system_exception_type`, `system_exception_msg`\n* **Array field**: `system_stack` contains all 15 stack trace locations (demonstrates automatic\n aggregation of repeated capture groups)\n* **LogType**: Template shows the structure with `<newLine>` markers indicating line boundaries in\n the original log\n\n---\n\n#### Stream parsing\n\nWhen parsing log streams or files, timestamps are **required** to perform contextual anchoring.\nTimestamps act as delimiters that separate individual log events, enabling the parser to correctly\ngroup multi-line entries (like stack traces) into single events.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\n# Parse from string (automatically converted to io.StringIO)\nSAMPLE_LOGS = \"\"\"16/05/04 04:31:13 INFO master.Master: Registering app SparkSQL::192.168.10.76\n16/05/04 12:32:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392\njava.io.IOException: Connection reset by peer\n at sun.nio.ch.FileDispatcherImpl.read0(Native Method)\n at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\n at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\n at sun.nio.ch.IOUtil.read(IOUtil.java:192)\n at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\n at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\n at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\n at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\n at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\n at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\n at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\n at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\n at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\n at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\n at java.lang.Thread.run(Thread.java:750)\n16/05/04 04:37:53 INFO master.Master: 192.168.10.76:41747 got disassociated, removing it.\n\"\"\"\n\n# Define parser with patterns\nparser = Parser()\n# REQUIRED: Timestamp acts as contextual anchor to separate individual log events in the stream\nparser.add_timestamp(\"TIMESTAMP_SPARK_1_6\", rf\"\\d{{2}}/\\d{{2}}/\\d{{2}} \\d{{2}}:\\d{{2}}:\\d{{2}}\")\nparser.add_var(\"SYSTEM_LEVEL\", rf\"(?<level>(INFO)|(WARN)|(ERROR))\")\nparser.add_var(\"SPARK_APP_NAME\", rf\"(?<spark_app_name>SparkSQL::{PATTERN.IPV4})\")\nparser.add_var(\"SPARK_HOST_IP_PORT\", rf\"(?<spark_host>spark\\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})\")\nparser.add_var(\n \"SYSTEM_EXCEPTION\",\n rf\"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): \"\n rf\"(?<system_exception_msg>{PATTERN.LOG_LINE})\"\n)\nparser.add_var(\n rf\"SYSTEM_STACK_TRACE\", rf\"(\\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})\"\n)\nparser.add_var(\"IP_PORT\", rf\"(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})\")\nparser.compile()\n\n# Stream parsing: iterate over multi-line log events\nfor idx, event in enumerate(parser.parse(SAMPLE_LOGS)):\n print(f\"log-event-{idx} log template type:{event.get_log_type().strip()}\")\n```\n\n**Output:**\n```\nlog-event-0 log template type:<timestamp> <level> master.Master: Registering app <spark_app_name>\nlog-event-1 log template type:<timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>\n<system_exception_type>: <system_exception_msg><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack><newLine> at <system_stack>\nlog-event-2 log template type:<timestamp> <level> master.Master: <system_ip>:<system_port> got disassociated, removing it.<newLine>\n```\n\nThe parser successfully separated the log stream into **three distinct events** using timestamps as\ncontextual anchors:\n* **Event 0**: Single-line app registration log\n* **Event 1**: Multi-line exception with 15 stack trace lines (demonstrates how timestamps bind\n multi-line events together)\n* **Event 2**: Single-line disassociation log\n\nEach log type shows the template structure with variable placeholders (`<level>`, `<system_ip>`,\netc.), enabling pattern-based log analysis and grouping.\n\n---\n\n#### Using `PATTERN` constants\n\nThe `PATTERN` class provides pre-built regex patterns for common log elements like IP addresses,\nUUIDs, numbers, and file paths. See the [PATTERN reference](#pattern) for the complete list of\navailable patterns.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nparser = Parser()\nparser.add_var(\"network\", rf\"IP: (?<ip>{PATTERN.IPV4}) UUID: (?<id>{PATTERN.UUID})\")\nparser.add_var(\"metrics\", rf\"value=(?<value>{PATTERN.FLOAT})\")\nparser.compile()\n\nlog_line = \"IP: 192.168.1.1 UUID: 550e8400-e29b-41d4-a716-446655440000 value=42.5\"\nevent = parser.parse_event(log_line)\n\nprint(f\"IP: {event['ip']}\")\nprint(f\"UUID: {event['id']}\")\nprint(f\"Value: {event['value']}\")\n```\n\n**Output:**\n```\nIP: 192.168.1.1\nUUID: 550e8400-e29b-41d4-a716-446655440000\nValue: 42.5\n```\n\n---\n\n#### Export to DataFrame\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\n \"metric\",\n rf\"metric=(?<metric_name>\\w+) value=(?<value>\\d+)\"\n)\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: metric=cpu value=42\n2024-01-01 INFO: metric=memory value=100\n2024-01-01 INFO: metric=disk value=7\n\"\"\"\n\n# Create a query and export to DataFrame\nquery = (\n Query(parser)\n .select([\"metric_name\", \"value\"])\n .from_(log_data)\n .validate_query()\n)\n\ndf = query.to_dataframe()\nprint(df)\n```\n\n---\n\n#### Filtering events\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\"metric\", rf\"metric=(?<metric_name>\\w+) value=(?<value>\\d+)\")\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: metric=cpu value=42\n2024-01-01 INFO: metric=memory value=100\n2024-01-01 INFO: metric=disk value=7\n2024-01-01 INFO: metric=cpu value=85\n\"\"\"\n\n# Filter events where value > 50\nquery = (\n Query(parser)\n .select([\"metric_name\", \"value\"])\n .from_(log_data)\n .filter(lambda event: int(event['value']) > 50)\n .validate_query()\n)\n\ndf = query.to_dataframe()\nprint(df)\n# Output:\n# metric_name value\n# 0 memory 100\n# 1 cpu 85\n```\n\n---\n\n#### Including log template type and log message\n\nUse special fields `@log_type` and `@log_message` to include alongside extracted variables:\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\"metric\", rf\"value=(?<value>\\d+)\")\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: Processing value=42\n2024-01-01 WARN: Processing value=100\n\"\"\"\n\n# Select log type, message, and all variables\nquery = (\n Query(parser)\n .select([\"@log_type\", \"@log_message\", \"*\"])\n .from_(log_data)\n .validate_query()\n)\n\ndf = query.to_dataframe()\nprint(df)\n# Output:\n# @log_type @log_message value\n# 0 <timestamp> INFO: Processing <metric> 2024-01-01 INFO: Processing value=42 42\n# 1 <timestamp> WARN: Processing <metric> 2024-01-01 WARN: Processing value=100 100\n```\n\nThe `\"*\"` wildcard expands to all variables defined in the schema and can be combined with other fields like `@log_type` and `@log_message`.\n\n---\n\n#### Analyzing Log Types\n\nDiscover and analyze log patterns in your data using log type analysis methods:\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\"metric\", rf\"value=(?<value>\\d+)\")\nparser.add_var(\"status\", rf\"status=(?<status>\\w+)\")\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: Processing value=42\n2024-01-01 INFO: Processing value=100\n2024-01-01 WARN: System status=degraded\n2024-01-01 INFO: Processing value=7\n2024-01-01 ERROR: System status=failed\n\"\"\"\n\nquery = Query(parser).from_(log_data)\n\n# Get all unique log types\nprint(\"Unique log types:\")\nfor log_type in query.get_log_types():\n print(f\" {log_type}\")\n\n# Reset stream for next analysis\nquery.from_(log_data)\n\n# Get log type occurrence counts\nprint(\"\\nLog type counts:\")\ncounts = query.get_log_type_counts()\nfor log_type, count in sorted(counts.items(), key=lambda x: -x[1]):\n print(f\" {count:3d} {log_type}\")\n\n# Reset stream for next analysis\nquery.from_(log_data)\n\n# Get sample messages for each log type\nprint(\"\\nLog type samples:\")\nsamples = query.get_log_type_with_sample(sample_size=2)\nfor log_type, messages in samples.items():\n print(f\" {log_type}\")\n for msg in messages:\n print(f\" - {msg.strip()}\")\n```\n\n**Output:**\n```\nUnique log types:\n <timestamp> INFO: Processing <metric>\n <timestamp> WARN: System <status>\n <timestamp> ERROR: System <status>\n\nLog type counts:\n 3 <timestamp> INFO: Processing <metric>\n 1 <timestamp> WARN: System <status>\n 1 <timestamp> ERROR: System <status>\n\nLog type samples:\n <timestamp> INFO: Processing <metric>\n - 2024-01-01 INFO: Processing value=42\n - 2024-01-01 INFO: Processing value=100\n <timestamp> WARN: System <status>\n - 2024-01-01 WARN: System status=degraded\n <timestamp> ERROR: System <status>\n - 2024-01-01 ERROR: System status=failed\n```\n\n---\n\n## Key concepts\n\n> **CRITICAL: You must understand these concepts to use `log-surgeon` correctly.**\n>\n> `log-surgeon` works **fundamentally differently** from traditional regex engines like Python's\n> `re` module, PCRE, or JavaScript regex. Skipping this section may lead to patterns that don't\n> work as expected.\n\n### Token-based parsing and delimiters\n\n**CRITICAL:** `log-surgeon` uses **token-based** parsing, not character-based regex matching like\ntraditional regex engines. This is the most important difference that affects how patterns work.\n\n#### How tokenization works\n\nDelimiters are characters used to split log messages into tokens. The default delimiters include:\n- Whitespace: space, tab (`\\t`), newline (`\\n`), carriage return (`\\r`)\n- Punctuation: `:`, `,`, `!`, `;`, `%`, `@`, `/`, `(`, `)`, `[`, `]`\n\nFor example, with default delimiters, the log message:\n```\n\"abc def ghi\"\n```\nis tokenized into three tokens: `[\"abc\", \"def\", \"ghi\"]`\n\nYou can customize delimiters when creating a Parser:\n\n```python\nparser = Parser(delimiters=r\" \\t\\n,:\") # Custom delimiters\n```\n\n#### Token-Based Pattern Matching\n\n**Critical:** Patterns like `.*` only match **within a single token**, not across multiple tokens or delimiters.\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser() # Default delimiters include space\nparser.add_var(\"token\", rf\"(?<match>d.*)\")\nparser.compile()\n\n# With \"abc def ghi\" tokenized as [\"abc\", \"def\", \"ghi\"]\nevent = parser.parse_event(\"abc def ghi\")\n\n# Matches only \"def\" (single token starting with 'd')\n# Does NOT match \"def ghi\" (would cross token boundary)\nprint(event['match']) # Output: \"def\"\n```\n\n**In a traditional regex engine**, `d.*` would match `\"def ghi\"` (everything from 'd' to end).\n**In log-surgeon**, `d.*` matches only `\"def\"` because patterns cannot cross delimiter boundaries.\n\n#### Why token-based?\n\nToken-based parsing enables:\n- **Faster parsing** by reducing search space\n- **Predictable behavior** aligned with log structure\n- **Efficient log type generation** for analytics\n\n#### Working with token boundaries\n\nTo match across multiple tokens, you must use **character classes** like `[a-zA-Z]*` instead of `.`:\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser() # Default delimiters include space\n\n# Using .* - only matches within a single token\nparser.add_var(\"wrong\", rf\"(?<match>d.*)\") # Matches only \"def\"\n\n# Using character classes - matches across tokens\nparser.add_var(\"correct\", rf\"(?<match>d[a-z ]*i)\") # Matches \"def ghi\"\nparser.compile()\n\nevent = parser.parse_event(\"abc def ghi\")\nprint(event['match']) # Output: \"def ghi\"\n```\n\n**Key Rule:** Character classes like `[a-zA-Z]*`, `[a-z ]*`, or `[\\w\\s]*` can match across token\nboundaries, but `.*` cannot.\n\n#### Alternation requires grouping\n\n**CRITICAL:** Alternation (`|`) works differently in log-surgeon compared to traditional regex\nengines. You **must** use parentheses to group alternatives.\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser()\n\n# WRONG: Without grouping - matches \"ab\" AND (\"c\" OR \"d\") AND \"ef\"\nparser.add_var(\"wrong\", rf\"(?<word>abc|def)\")\n# In log-surgeon, this is interpreted as: \"ab\" + \"c|d\" + \"ef\"\n# Matches: \"abcef\" or \"abdef\" (NOT \"abc\" or \"def\")\n\n# CORRECT: With grouping - matches \"abc\" OR \"def\"\nparser.add_var(\"correct\", rf\"(?<word>(abc)|(def))\")\n# Matches: \"abc\" or \"def\"\nparser.compile()\n```\n\n**In traditional regex engines**, `abc|def` means \"abc\" OR \"def\".\n**In log-surgeon**, `abc|def` means \"ab\" + (\"c\" OR \"d\") + \"ef\".\n\n**Key Rule:** Always use `(abc)|(def)` syntax for alternation to match complete alternatives.\n\n```python\n# More examples:\nparser.add_var(\"level\", rf\"(?<level>(ERROR)|(WARN)|(INFO))\") # Correct\nparser.add_var(\"status\", rf\"(?<status>(success)|(failure))\") # Correct\nparser.add_var(\"bad\", rf\"(?<status>success|failure)\") # Wrong - unexpected behavior\n```\n\n#### Optional patterns\n\nFor optional patterns, use `{0,1}` instead of `*`:\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser()\n\n# Avoid using * for optional patterns (matches 0 or more)\nparser.add_var(\"avoid\", rf\"(?<level>(ERROR)|(WARN))*\") # Can match empty string or multiple reps\n\n# Do not use ? for optional patterns\nparser.add_var(\"avoid2\", rf\"(?<level>(ERROR)|(WARN))?\") # May not work as expected\n\n# Use {0,1} for optional patterns (matches 0 or 1)\nparser.add_var(\"optional\", rf\"(?<level>(ERROR)|(WARN)){0,1}\") # Matches 0 or 1 occurrence\nparser.compile()\n```\n\n**Best practice:** Use `{0,1}` for optional elements. Avoid `*` (0 or more) and `?` for optional\nmatching.\n\nYou can also explicitly include delimiters in your pattern:\n\n```python\n# To match \"def ghi\", explicitly include the space delimiter\nparser.add_var(\"multi\", rf\"(?<match>d\\w+\\s+\\w+)\")\n# This matches \"def \" as one token segment, followed by \"ghi\"\n```\n\nOr adjust your delimiters to change tokenization behavior:\n\n```python\n# Use only newline as delimiter to treat entire lines as tokens\nparser = Parser(delimiters=r\"\\n\")\n```\n\n### Named capture groups\n\nUse named capture groups in regex patterns to extract specific fields:\n\n```python\nparser.add_var(\"metric\", rf\"metric=(?<metric_name>\\w+) value=(?<value>\\d+)\")\n```\n\nThe syntax `(?<name>pattern)` creates a capture group that can be accessed as `event['name']`.\n\n**Note:** See [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for best practices on\nwriting regex patterns.\n\n### Using raw f-strings for regex patterns\n\n> **\u26a0\ufe0f STRONGLY RECOMMENDED: Use raw f-strings (`rf\"...\"`) for all regex patterns.**\n>\n> While not absolutely required, using regular strings will likely cause escaping issues and pattern\nfailures. Raw f-strings prevent these problems.\n\nRaw f-strings combine the benefits of:\n- **Raw strings (`r\"...\"`)**: No need to double-escape regex special characters like `\\d`, `\\w`,\n `\\n`\n- **f-strings (`f\"...\"`)**: Easy interpolation of variables and pattern constants\n\n#### Why use raw f-strings?\n\n```python\n# Without raw strings - requires double-escaping\nparser.add_var(\"metric\", \"value=(\\\\d+)\") # Hard to read, error-prone\n\n# With raw f-strings - single escaping, clean and readable\nparser.add_var(\"metric\", rf\"value=(?<value>\\d+)\")\n```\n\n#### Watch out for braces in f-strings\n\nWhen using f-strings, literal `{` and `}` characters must be escaped by doubling them:\n\n```python\nfrom log_surgeon import Parser, Pattern\n\nparser = Parser()\n\n# Correct: Escape literal braces in regex\nparser.add_var(\"json\", rf\"data={{(?<content>[^}}]+)}}\") # Matches: data={...}\nparser.add_var(\"range\", rf\"range={{(?<min>\\d+),(?<max>\\d+)}}\") # Matches: range={10,20}\n\n# Using Pattern constants with interpolation\nparser.add_var(\"ip\", rf\"IP: (?<ip>{Pattern.IPV4})\")\nparser.add_var(\"float\", rf\"value=(?<val>{Pattern.FLOAT})\")\n\n# Common regex patterns\nparser.add_var(\"digits\", rf\"\\d+ items\") # No double-escaping needed\nparser.add_var(\"word\", rf\"name=(?<name>\\w+)\")\nparser.add_var(\"whitespace\", rf\"split\\s+by\\s+spaces\")\n\nparser.compile()\n```\n\n#### Examples: raw f-strings vs regular strings\n\n```python\n# Regular string - requires double-escaping\nparser.add_var(\"path\", \"path=(?<path>\\\\w+/\\\\w+)\") # Hard to read\n\n# Raw f-string - natural regex syntax\nparser.add_var(\"path\", rf\"path=(?<path>\\w+/\\w+)\") # Clean and readable\n\n# With interpolation\nlog_level = \"INFO|WARN|ERROR\"\nparser.add_var(\"level\", rf\"(?<level>{log_level})\") # Easy to compose\n```\n\n**Recommendation:** Consistently use `rf\"...\"` for all regex patterns. This approach:\n- Avoids double-escaping mistakes that break patterns\n- Makes patterns more readable\n- Allows easy use of Pattern constants and variables\n- Only requires watching for literal braces `{` and `}` in f-strings (escape as `{{` and `}}`)\n\nUsing regular strings (`\"...\"`) will require double-escaping (e.g., `\"\\\\d+\"`) which is error-prone\nand can be hard to read.\n\n### Logical vs. physical names\n\nInternally, log-surgeon uses \"physical\" names (e.g., `CGPrefix0`, `CGPrefix1`) for capture groups,\nwhile you work with \"logical\" names (e.g., `user_id`, `thread`). The `GroupNameResolver` handles\nthis mapping automatically.\n\n### Schema Format\n\nThe schema defines delimiters, timestamps, and variables for parsing:\n\n```\n// schema delimiters\ndelimiters: \\t\\r\\n:,!;%@/\\(\\)\\[\\]\n\n// schema timestamps\ntimestamp:<timestamp_regex>\n\n// schema variables\nvariable_name:<variable_regex>\n```\n\nWhen using the fluent API (`Parser.add_var()` and `Parser.compile()`), the schema is built automatically.\n\n### Common Pitfalls\n\n **Pattern doesn't match anything**\n- Check: Are you using `.*` to match across tokens? Use `[a-zA-Z ]*` instead\n- Check: Did you forget to call `parser.compile()`?\n- Check: Are your delimiters splitting tokens unexpectedly?\n\n **Alternation not working (abc|def)**\n- Problem: `(?<name>abc|def)` doesn't match \"abc\" or \"def\" as expected\n- Solution: Use `(?<name>(abc)|(def))` with explicit grouping\n\n **Pattern works in regex tester but not here**\n- Remember: log-surgeon is token-based, not character-based\n- Traditional regex engines match across entire strings\n- log-surgeon matches within token boundaries (delimited by spaces, colons, etc.)\n- Read: [Token-Based Parsing](#token-based-parsing-and-delimiters)\n\n **Escape sequence errors in Python**\n- Problem: `parser.add_var(\"digits\", \"(?<num>\\d+)\")` raises SyntaxError\n- Solution: Use `rf\"...\"` (raw f-string) instead of `\"...\"` or `f\"...\"`\n- Example: `parser.add_var(\"digits\", rf\"(?<num>\\d+)\")`\n\n **Optional pattern matching incorrectly**\n- Problem: Using `?` or `*` for optional patterns\n- Solution: Use `{0,1}` for optional elements\n- Example: `(?<level>(ERROR)|(WARN)){0,1}` for optional log level\n\n---\n\n## Reference\n\n| Task | Syntax |\n|------|--------|\n| Named capture | `(?<name>pattern)` |\n| Alternation | `(?<name>(opt1)|(opt2))` (NOT `opt1|opt2`) |\n| Optional | `{0,1}` (NOT `?` or `*`) |\n| Match across tokens | Use `[a-z ]*` (NOT `.*`) |\n| Pattern string | `rf\"...\"` (raw f-string recommended) |\n| All variables | `.select([\"*\"])` |\n| Log type | `.select([\"@log_type\"])` |\n| Original message | `.select([\"@log_message\"])` |\n\n### Parser\n\nHigh-level parser for extracting structured data from unstructured log messages.\n\n#### Constructor\n\n- `Parser(delimiters: str = r\" \\t\\r\\n:,!;%@/\\(\\)\\[\\]\")`\n - Initialize a parser with optional custom delimiters\n - Default delimiters include space, tab, newline, and common punctuation\n\n#### Methods\n\n- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> Parser`\n - Add a variable pattern to the parser's schema\n - Supports named capture groups using `(?<name>)` syntax\n - Use raw f-strings (`rf\"...\"`) for regex patterns (see [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns))\n - Returns self for method chaining\n\n- `add_timestamp(name: str, regex: str) -> Parser`\n - Add a timestamp pattern to the parser's schema\n - Returns self for method chaining\n\n- `compile(enable_debug_logs: bool = False) -> None`\n - Build and initialize the parser with the configured schema\n - Must be called after adding variables and before parsing\n - Set `enable_debug_logs=True` to output debug information to stderr\n\n- `load_schema(schema: str, group_name_resolver: GroupNameResolver) -> None`\n - Load a pre-built schema string to configure the parser\n\n- `parse(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Generator[LogEvent, None, None]`\n - Parse all log events from a string, file object, or stream\n - Accepts strings, text/binary file objects, StringIO, or BytesIO\n - Yields LogEvent objects for each parsed event\n\n- `parse_event(payload: str) -> LogEvent | None`\n - Parse a single log event from a string (convenience method)\n - Wraps `parse()` and returns the first event\n - Returns LogEvent or None if no event found\n\n### LogEvent\n\nRepresents a parsed log event with extracted variables.\n\n#### Methods\n\n- `get_log_message() -> str`\n - Get the original log message\n\n- `get_log_type() -> str`\n - Get the generated log type (template) with logical group names\n\n- `get_capture_group(logical_capture_group_name: str, raw_output: bool = False) -> str | list | None`\n - Get the value of a capture group by its logical name\n - If `raw_output=False` (default), single values are unwrapped from lists\n - Returns None if capture group not found\n\n- `get_capture_group_str_representation(field: str, raw_output: bool = False) -> str`\n - Get the string representation of a capture group value\n\n- `get_resolved_dict() -> dict[str, str | list]`\n - Get a dictionary with all capture groups using logical (user-defined) names\n - Physical names (CGPrefix*) are converted to logical names\n - Timestamp fields are consolidated under \"timestamp\" key\n - Single-value lists are unwrapped to scalar values\n - \"@LogType\" is excluded from the output\n\n- `__getitem__(key: str) -> str | list`\n - Access capture group values by name (e.g., `event['field_name']`)\n - Shorthand for `get_capture_group(key, raw_output=False)`\n\n- `__str__() -> str`\n - Get formatted JSON representation of the log event with logical group names\n - Uses `get_resolved_dict()` internally\n\n### Query\n\nQuery builder for parsing log events into structured data formats.\n\n#### Constructor\n\n- `Query(parser: Parser)`\n - Initialize a query with a configured parser\n\n#### Methods\n\n- `select(fields: list[str]) -> Query`\n - Select fields to extract from log events\n - Supports variable names, `\"*\"` for all variables, `\"@log_type\"` for log type, and `\"@log_message\"` for original message\n - The `\"*\"` wildcard can be combined with other fields (e.g., `[\"@log_type\", \"*\"]`)\n - Returns self for method chaining\n\n- `filter(predicate: Callable[[LogEvent], bool]) -> Query`\n - Filter log events using a predicate function\n - Predicate receives a LogEvent and returns True to include it, False to exclude\n - Returns self for method chaining\n - Example: `query.filter(lambda event: int(event['value']) > 50)`\n\n- `from_(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`\n - Set the input source to parse\n - Accepts strings, text/binary file objects, StringIO, or BytesIO\n - Strings are automatically wrapped in StringIO\n - Returns self for method chaining\n\n- `select_from(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`\n - Alias for `from_()`\n - Returns self for method chaining\n\n- `validate_query() -> Query`\n - Validate that the query is properly configured\n - Returns self for method chaining\n\n- `to_dataframe() -> pd.DataFrame`\n - Convert parsed events to a pandas DataFrame\n\n- `to_df() -> pd.DataFrame`\n - Alias for `to_dataframe()`\n\n- `to_arrow() -> pa.Table`\n - Convert parsed events to a PyArrow Table\n\n- `to_pa() -> pa.Table`\n - Alias for `to_arrow()`\n\n- `get_rows() -> list[list]`\n - Extract rows of field values from parsed events\n\n- `get_vars() -> KeysView[str]`\n - Get all variable names (logical capture group names) defined in the schema\n\n- `get_log_types() -> Generator[str, None, None]`\n - Get all unique log types from parsed events\n - Yields log types in the order they are first encountered\n - Useful for discovering log patterns in your data\n\n- `get_log_type_counts() -> dict[str, int]`\n - Get count of occurrences for each unique log type\n - Returns dictionary mapping log types to their counts\n - Useful for analyzing log type distribution\n\n- `get_log_type_with_sample(sample_size: int = 3) -> dict[str, list[str]]`\n - Get sample log messages for each unique log type\n - Returns dictionary mapping log types to lists of sample messages\n - Useful for understanding what actual messages match each template\n\n### SchemaCompiler\n\nCompiler for constructing log-surgeon schema definitions.\n\n#### Constructor\n\n- `SchemaCompiler(delimiters: str = DEFAULT_DELIMITERS)`\n - Initialize a schema compiler with optional custom delimiters\n\n#### Methods\n\n- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> SchemaCompiler`\n - Add a variable pattern to the schema\n - Returns self for method chaining\n\n- `add_timestamp(name: str, regex: str) -> SchemaCompiler`\n - Add a timestamp pattern to the schema\n - Returns self for method chaining\n\n- `remove_var(var_name: str) -> SchemaCompiler`\n - Remove a variable from the schema\n - Returns self for method chaining\n\n- `get_var(var_name: str) -> Variable`\n - Get a variable by name\n\n- `compile() -> str`\n - Compile the final schema string\n\n- `get_capture_group_name_resolver() -> GroupNameResolver`\n - Get the resolver for mapping logical to physical capture group names\n\n### GroupNameResolver\n\nBidirectional mapping between logical (user-defined) and physical (auto-generated) group names.\n\n#### Constructor\n\n- `GroupNameResolver(physical_name_prefix: str)`\n - Initialize with a prefix for auto-generated physical names\n\n#### Methods\n\n- `create_new_physical_name(logical_name: str) -> str`\n - Create a new unique physical name for a logical name\n - Each call generates a new physical name\n\n- `get_physical_names(logical_name: str) -> set[str]`\n - Get all physical names associated with a logical name\n\n- `get_logical_name(physical_name: str) -> str`\n - Get the logical name for a physical name\n\n- `get_all_logical_names() -> KeysView[str]`\n - Get all logical names that have been registered\n\n### PATTERN\n\nCollection of pre-built regex patterns optimized for log parsing. These patterns follow log-surgeon's syntax requirements and are ready to use with named capture groups.\n\n#### Available Patterns\n\n**Network Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.UUID` | UUID (Universally Unique Identifier) | `550e8400-e29b-41d4-a716-446655440000` |\n| `PATTERN.IP_OCTET` | Single IPv4 octet (0-255) | `192`, `10`, `255` |\n| `PATTERN.IPV4` | IPv4 address | `192.168.1.1`, `10.0.0.1` |\n| `PATTERN.PORT` | Network port number (1-5 digits) | `80`, `8080`, `65535` |\n\n**Numeric Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.INT` | Integer with optional negative sign | `42`, `-123`, `0` |\n| `PATTERN.FLOAT` | Float with optional negative sign | `3.14`, `-123.456`, `0.5` |\n\n**File System Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.LINUX_FILE_NAME_CHARSET` | Character set for Linux file names | `a-zA-Z0-9 ._-` |\n| `PATTERN.LINUX_FILE_NAME` | Linux file name | `app.log`, `config-2024.yaml` |\n| `PATTERN.LINUX_FILE_PATH` | Linux file path (relative) | `logs/app.log`, `var/log/system.log` |\n\n**Character Sets and Word Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.JAVA_IDENTIFIER_CHARSET` | Java identifier character set | `a-zA-Z0-9_` |\n| `PATTERN.JAVA_IDENTIFIER` | Java identifier | `myVariable`, `$value`, `Test123` |\n| `PATTERN.LOG_LINE_CHARSET` | Common log line characters | Alphanumeric + symbols + whitespace |\n| `PATTERN.LOG_LINE` | General log line content | `Error: connection timeout` |\n| `PATTERN.LOG_LINE_NO_WHITE_SPACE_CHARSET` | Log line chars without whitespace | Alphanumeric + symbols only |\n| `PATTERN.LOG_LINE_NO_WHITE_SPACE` | Log content without spaces | `ERROR`, `/var/log/app.log` |\n\n**Java-Specific Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.JAVA_LITERAL_CHARSET` | Java literal character set | `a-zA-Z0-9_$` |\n| `PATTERN.JAVA_PACKAGE_SEGMENT` | Single Java package segment | `com.`, `example.` |\n| `PATTERN.JAVA_CLASS_NAME` | Java class name | `MyClass`, `ArrayList` |\n| `PATTERN.JAVA_FULLY_QUALIFIED_CLASS_NAME` | Fully qualified class name | `java.util.ArrayList` |\n| `PATTERN.JAVA_LOGGING_CODE_LOCATION_HINT` | Java logging location hint | `~[MyClass.java:42?]` |\n| `PATTERN.JAVA_STACK_LOCATION` | Java stack trace location | `java.util.ArrayList.add(ArrayList.java:123)` |\n\n#### Example usage\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nparser = Parser()\n\n# Network patterns\nparser.add_var(\"network\", rf\"IP: (?<ip>{PATTERN.IPV4}) Port: (?<port>{PATTERN.PORT})\")\n\n# Numeric patterns\nparser.add_var(\"metrics\", rf\"value=(?<value>{PATTERN.FLOAT}) count=(?<count>{PATTERN.INT})\")\n\n# File system patterns\nparser.add_var(\"file\", rf\"Opening (?<filepath>{PATTERN.LINUX_FILE_PATH})\")\n\n# Java patterns\nparser.add_var(\"exception\", rf\"at (?<stack>{PATTERN.JAVA_STACK_LOCATION})\")\n\nparser.compile()\n```\n\n#### Composing Patterns\n\nPATTERN constants can be composed to build more complex patterns:\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nparser = Parser()\n\n# Combine multiple patterns\nparser.add_var(\n \"server_info\",\n rf\"Server (?<name>{PATTERN.JAVA_IDENTIFIER}) at (?<ip>{PATTERN.IPV4}):(?<port>{PATTERN.PORT})\"\n)\n\n# Use character sets to build custom patterns\nparser.add_var(\n \"custom_id\",\n rf\"ID-(?<id>[{PATTERN.JAVA_IDENTIFIER_CHARSET}]+)\"\n)\n\nparser.compile()\n```\n\n---\n\n## Development\n\n### Building from source\n\n```bash\n# Clone the repository\ngit clone https://github.com/y-scope/log-surgeon-ffi-py.git\ncd log-surgeon-ffi-py\n\n# Install the project in editable mode\npip install -e .\n\n# Build the extension\ncmake -S . -B build\ncmake --build build\n```\n\n### Running tests\n\n```bash\n# Install test dependencies\npip install pytest\n\n# Run tests\npython -m pytest tests/\n```\n\n---\n\n## License\n\nApache License 2.0 - See [LICENSE](LICENSE) for details.\n\n---\n\n## Links\n\n- [Homepage](https://github.com/y-scope/log-surgeon-ffi-py)\n- [Bug Tracker](https://github.com/y-scope/log-surgeon-ffi-py/issues)\n- [log-surgeon C++ library](https://github.com/y-scope/log-surgeon)\n\n---\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
"bugtrack_url": null,
"license": "Apache License 2.0",
"summary": "Python FFI bindings for log-surgeon: high-performance parsing of unstructured logs into structured data",
"version": "0.1.0b4",
"project_urls": {
"Bug Tracker": "https://github.com/y-scope/log-surgeon-ffi-py/issues",
"Homepage": "https://github.com/y-scope/log-surgeon-ffi-py"
},
"split_keywords": [
"logging",
" log-parsing",
" log-analysis",
" structured-data",
" performance",
" observability"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5a9793c6d62614a9984080cff66f5664f08d5b963c98c6bbd108e26ddb307085",
"md5": "d41d2bd9b9093d8ea5ed9cbc572e2d40",
"sha256": "49e1f0712140e8b53d39b0970d76f0291e7ba470ea4df8c7ed41190de9c43114"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "d41d2bd9b9093d8ea5ed9cbc572e2d40",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 339352,
"upload_time": "2025-10-27T16:38:10",
"upload_time_iso_8601": "2025-10-27T16:38:10.766794Z",
"url": "https://files.pythonhosted.org/packages/5a/97/93c6d62614a9984080cff66f5664f08d5b963c98c6bbd108e26ddb307085/log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "3fdff3da814b7f0f078f3e6d81c86c11c90f6446f62ecd340e64c6e5a448638e",
"md5": "18611017e3299da3169c5c808d8ccccb",
"sha256": "46151940c76d82b6bc8567d84b491c3d6dafa1b3577d554dd229bc9c28c2c2e6"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "18611017e3299da3169c5c808d8ccccb",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 367500,
"upload_time": "2025-10-27T16:38:13",
"upload_time_iso_8601": "2025-10-27T16:38:13.131031Z",
"url": "https://files.pythonhosted.org/packages/3f/df/f3da814b7f0f078f3e6d81c86c11c90f6446f62ecd340e64c6e5a448638e/log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4e440479773731d183da1c3acfcde4777c53ee55bc9d99b4309d1c2a5f0ee509",
"md5": "4ffe485d25b62841135e100833a2dd69",
"sha256": "d69b612e89e06b565ee5c1c51e51c760eb6304b32eba9c3507cee8f785e01a48"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "4ffe485d25b62841135e100833a2dd69",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 351110,
"upload_time": "2025-10-27T16:38:16",
"upload_time_iso_8601": "2025-10-27T16:38:16.712690Z",
"url": "https://files.pythonhosted.org/packages/4e/44/0479773731d183da1c3acfcde4777c53ee55bc9d99b4309d1c2a5f0ee509/log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "74b97a10e1e130ff31b24edf37f5912eda06d881a2e8593d0add6f07a8e100d9",
"md5": "c42ef923f61f8c73d5b9ad065b30a3e0",
"sha256": "7d00e36a937d1667d7ec38d388fc5daf64866815f1bb7d747c7e82b4b41d6326"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "c42ef923f61f8c73d5b9ad065b30a3e0",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 1265909,
"upload_time": "2025-10-27T16:38:18",
"upload_time_iso_8601": "2025-10-27T16:38:18.253028Z",
"url": "https://files.pythonhosted.org/packages/74/b9/7a10e1e130ff31b24edf37f5912eda06d881a2e8593d0add6f07a8e100d9/log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "034899af9b94d3a88f58120225e261868ea5ccaafdb329b371daa9187f53502d",
"md5": "5f9ffb0dab694ab0d9e434e946b8c004",
"sha256": "924ce7da1aa58e7e7409284b76eb7fef0294682f4468b08aff218fb2db158f28"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "5f9ffb0dab694ab0d9e434e946b8c004",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 1431122,
"upload_time": "2025-10-27T16:38:19",
"upload_time_iso_8601": "2025-10-27T16:38:19.877086Z",
"url": "https://files.pythonhosted.org/packages/03/48/99af9b94d3a88f58120225e261868ea5ccaafdb329b371daa9187f53502d/log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ec1e7bb7032a170ca87c403af733d886a999b9018634e552d929af3751c8979e",
"md5": "8a319b701eed48c1b33c8069d5d60186",
"sha256": "3b0a36c1da5745e94dcc1ff2d0ecaebf9182043f8d367634f49b9c5361f3d952"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "8a319b701eed48c1b33c8069d5d60186",
"packagetype": "bdist_wheel",
"python_version": "cp310",
"requires_python": ">=3.9",
"size": 1326058,
"upload_time": "2025-10-27T16:38:21",
"upload_time_iso_8601": "2025-10-27T16:38:21.139370Z",
"url": "https://files.pythonhosted.org/packages/ec/1e/7bb7032a170ca87c403af733d886a999b9018634e552d929af3751c8979e/log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "6efba8fbf4c654bcd437aad5f7bed67da682af9bbfaf5b87ba49ff65a8bb3057",
"md5": "a1f92c4bf528f05c314b2b6106e889e7",
"sha256": "51e1f35e23996f057cdb6757cf58287f41846e5cce4aa49f7b62504626770ce2"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "a1f92c4bf528f05c314b2b6106e889e7",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 339350,
"upload_time": "2025-10-27T16:38:22",
"upload_time_iso_8601": "2025-10-27T16:38:22.591542Z",
"url": "https://files.pythonhosted.org/packages/6e/fb/a8fbf4c654bcd437aad5f7bed67da682af9bbfaf5b87ba49ff65a8bb3057/log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4f2ad5172f1820c4b5701907924403de1d7b4301a7b89a8d2fc135e5646621a3",
"md5": "6536785b7fcab4265466819a9e580eef",
"sha256": "c60fa11d004817bedff0bf2802946876b6d5fe668b42c835137dcc2e2646f910"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "6536785b7fcab4265466819a9e580eef",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 367498,
"upload_time": "2025-10-27T16:38:23",
"upload_time_iso_8601": "2025-10-27T16:38:23.938708Z",
"url": "https://files.pythonhosted.org/packages/4f/2a/d5172f1820c4b5701907924403de1d7b4301a7b89a8d2fc135e5646621a3/log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "b7d591f1fa624e35700d075b42c388c3daf61d8feb0a1f623941b9cc40e4ad64",
"md5": "76edd55d4262fd92671d94d372504c00",
"sha256": "ee0cb940c3ca50ed68ec44fbb46640ef80704a02a086c36d8c6fb5165a4fa1c0"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "76edd55d4262fd92671d94d372504c00",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 351110,
"upload_time": "2025-10-27T16:38:25",
"upload_time_iso_8601": "2025-10-27T16:38:25.117741Z",
"url": "https://files.pythonhosted.org/packages/b7/d5/91f1fa624e35700d075b42c388c3daf61d8feb0a1f623941b9cc40e4ad64/log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c0af151b7559edfca2b8ab9a6a7c4f4bd68f69699a3494fedc00f065dc56aa9b",
"md5": "eeb5eef6abd94f43bd40709b990eac4a",
"sha256": "acda63fc5be39f4fa70500b6685382485897f0528410ac4f2ceb6d8f5c86e17b"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "eeb5eef6abd94f43bd40709b990eac4a",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 1265911,
"upload_time": "2025-10-27T16:38:26",
"upload_time_iso_8601": "2025-10-27T16:38:26.576461Z",
"url": "https://files.pythonhosted.org/packages/c0/af/151b7559edfca2b8ab9a6a7c4f4bd68f69699a3494fedc00f065dc56aa9b/log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "ce07ae7c8d14f1c90bbff7ad2e49e8f5c8ff4c5710bbc2015122d3d67e63d753",
"md5": "09add42c54812eed8eaf11bf49046310",
"sha256": "651f9e565f1c41464bc8928bf83b81be9ed6b75356bc5f2a6ae6dd74c32b16a3"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "09add42c54812eed8eaf11bf49046310",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 1431125,
"upload_time": "2025-10-27T16:38:28",
"upload_time_iso_8601": "2025-10-27T16:38:28.200600Z",
"url": "https://files.pythonhosted.org/packages/ce/07/ae7c8d14f1c90bbff7ad2e49e8f5c8ff4c5710bbc2015122d3d67e63d753/log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4f33f269167f909cc2c45e4e557e5731a92f2027338887d5961c4c025f94ad94",
"md5": "6da0ec55ebdf7a0a6be81395b6e6be58",
"sha256": "023fa13694855b71b92cfafb0bc7b08d54f3f9c7bb4f1b2cf18713413f4aebc5"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "6da0ec55ebdf7a0a6be81395b6e6be58",
"packagetype": "bdist_wheel",
"python_version": "cp311",
"requires_python": ">=3.9",
"size": 1326062,
"upload_time": "2025-10-27T16:38:29",
"upload_time_iso_8601": "2025-10-27T16:38:29.376900Z",
"url": "https://files.pythonhosted.org/packages/4f/33/f269167f909cc2c45e4e557e5731a92f2027338887d5961c4c025f94ad94/log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "dffa1339883bba67efce44689851a6a79c8281e580ae0dff67763c60d5605d2d",
"md5": "1a8e8696d9f64575ba3497b431789ae6",
"sha256": "43f3012e955e38b1395b3691e5952a9baee0347fb0600c433d8fbf8e205e488e"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "1a8e8696d9f64575ba3497b431789ae6",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 339403,
"upload_time": "2025-10-27T16:38:30",
"upload_time_iso_8601": "2025-10-27T16:38:30.728274Z",
"url": "https://files.pythonhosted.org/packages/df/fa/1339883bba67efce44689851a6a79c8281e580ae0dff67763c60d5605d2d/log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7103deb181fb7d448cccb2e4706299714f3e1b9aa55527c2d255255048a697cf",
"md5": "8d4fe6d762f5915d255d313c666512a4",
"sha256": "91237e2db3610016566fa6d281ed1229b58c6ccb7e6c3551bd0f27801b72ae09"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "8d4fe6d762f5915d255d313c666512a4",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 367637,
"upload_time": "2025-10-27T16:38:32",
"upload_time_iso_8601": "2025-10-27T16:38:32.310179Z",
"url": "https://files.pythonhosted.org/packages/71/03/deb181fb7d448cccb2e4706299714f3e1b9aa55527c2d255255048a697cf/log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "478bfcfbfa67ab585ca28461f16e94e24e7624d0507b4bbf564d13bd231b4bae",
"md5": "36f0f7c431343f7fa9d3d3e56c59d2f5",
"sha256": "979ecc9610850359d501c88a0c4a8f3e3e466d58cd53348877466c596971e0df"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "36f0f7c431343f7fa9d3d3e56c59d2f5",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 351213,
"upload_time": "2025-10-27T16:38:33",
"upload_time_iso_8601": "2025-10-27T16:38:33.337177Z",
"url": "https://files.pythonhosted.org/packages/47/8b/fcfbfa67ab585ca28461f16e94e24e7624d0507b4bbf564d13bd231b4bae/log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "a6b9a10e1fa6ea57d6e3396393f832030d80293825dfcc0e15b302d199f8b90a",
"md5": "a870e22ccc7a6f0a3639537b01f01b2f",
"sha256": "d9093a7d5cc212125ca4ed142f0b780f12e4904be133deacb51cdbc331d45c1b"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "a870e22ccc7a6f0a3639537b01f01b2f",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 1265961,
"upload_time": "2025-10-27T16:38:34",
"upload_time_iso_8601": "2025-10-27T16:38:34.423812Z",
"url": "https://files.pythonhosted.org/packages/a6/b9/a10e1fa6ea57d6e3396393f832030d80293825dfcc0e15b302d199f8b90a/log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "57e9c58eecb156b62229ce532ce8ab0abaefbaef87f8bc5bdb3e39b2fc685211",
"md5": "c2d828c9f6fcc58949a69265d34453f0",
"sha256": "3084e098a1a5d599b0a2d6203d43cffc2a52f903e3cb398ac409141a7ee0ca33"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "c2d828c9f6fcc58949a69265d34453f0",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 1431166,
"upload_time": "2025-10-27T16:38:35",
"upload_time_iso_8601": "2025-10-27T16:38:35.749438Z",
"url": "https://files.pythonhosted.org/packages/57/e9/c58eecb156b62229ce532ce8ab0abaefbaef87f8bc5bdb3e39b2fc685211/log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "06deefe535ee1021cb9ddb8fa8fd4b1e1f63b13dd81f3bb838ae3f7dc17d8ed2",
"md5": "6e855fa38dee3f740ad4823ed2ce7e87",
"sha256": "f4512155ea6d885b6eeb8c8ba30f5e58cb2b27565c782fd8ef295d856f151ace"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "6e855fa38dee3f740ad4823ed2ce7e87",
"packagetype": "bdist_wheel",
"python_version": "cp312",
"requires_python": ">=3.9",
"size": 1326131,
"upload_time": "2025-10-27T16:38:36",
"upload_time_iso_8601": "2025-10-27T16:38:36.941029Z",
"url": "https://files.pythonhosted.org/packages/06/de/efe535ee1021cb9ddb8fa8fd4b1e1f63b13dd81f3bb838ae3f7dc17d8ed2/log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "9b0fc8593373ab2a5d825b354ef424a85aac2c13bf2e8df462b894bae15a5673",
"md5": "e662ec48f708bb62dcee6efd2cecd9bb",
"sha256": "d25162f8bbfc50bac08c7333efb9a28af8b55dd4682446a946c7852b3ebfb2d8"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "e662ec48f708bb62dcee6efd2cecd9bb",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.9",
"size": 339361,
"upload_time": "2025-10-27T16:38:38",
"upload_time_iso_8601": "2025-10-27T16:38:38.101138Z",
"url": "https://files.pythonhosted.org/packages/9b/0f/c8593373ab2a5d825b354ef424a85aac2c13bf2e8df462b894bae15a5673/log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "92820e7e0b07ad358808415594a5cdfe25741a2e3da43f81be61c4d95fe5d771",
"md5": "ded8a788c63ee164134e682cd66d6610",
"sha256": "031c233bfa9c74cc1706150c1e41e5e37c654eded3c6897a9a0f05abf26ece5b"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "ded8a788c63ee164134e682cd66d6610",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.9",
"size": 367582,
"upload_time": "2025-10-27T16:38:39",
"upload_time_iso_8601": "2025-10-27T16:38:39.536518Z",
"url": "https://files.pythonhosted.org/packages/92/82/0e7e0b07ad358808415594a5cdfe25741a2e3da43f81be61c4d95fe5d771/log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "adcb9f9094ade52939b454cf84023a5f5716a29d5020b8919244b2518f244dd8",
"md5": "998ab75424bb6732c60a56398d07490d",
"sha256": "bcf3b5d92f939c3ea71ceea7c3bdee00fab7ea5da86708a4b2fac16c996696a3"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "998ab75424bb6732c60a56398d07490d",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.9",
"size": 351212,
"upload_time": "2025-10-27T16:38:42",
"upload_time_iso_8601": "2025-10-27T16:38:42.011616Z",
"url": "https://files.pythonhosted.org/packages/ad/cb/9f9094ade52939b454cf84023a5f5716a29d5020b8919244b2518f244dd8/log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "89197a0dcfea834093112589a3776c848e3ecdfb060ab84a6b1bf1a406a195c3",
"md5": "07c3dc8ae0765e774a7f81af1e59959f",
"sha256": "3ab6cf3ba2ff2c1717a569f5055aa217299418aea7d8ad48112a4db039f1cae3"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "07c3dc8ae0765e774a7f81af1e59959f",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.9",
"size": 1265955,
"upload_time": "2025-10-27T16:38:43",
"upload_time_iso_8601": "2025-10-27T16:38:43.219212Z",
"url": "https://files.pythonhosted.org/packages/89/19/7a0dcfea834093112589a3776c848e3ecdfb060ab84a6b1bf1a406a195c3/log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "7fd8f3d028b1925dd93eae879987f2a737aa0e764f25cdac2b88e1f6877a9788",
"md5": "7b6cfdd93f6a98f01518b275b7ce0f5b",
"sha256": "61d542ce394318af73197f8809e738721d259386a9be41d8ecc28fb83d4cf2cc"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "7b6cfdd93f6a98f01518b275b7ce0f5b",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.9",
"size": 1431163,
"upload_time": "2025-10-27T16:38:44",
"upload_time_iso_8601": "2025-10-27T16:38:44.463094Z",
"url": "https://files.pythonhosted.org/packages/7f/d8/f3d028b1925dd93eae879987f2a737aa0e764f25cdac2b88e1f6877a9788/log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "669069c33f2e9c2577f5f46f3684de017abb2439b89ee47d0bc2e2b70466ca86",
"md5": "4bed7d66b47bf496a6056617d3537581",
"sha256": "42da0434c6e145bfa6b755fa81276160e113018b826b043314463fb9e9d9feaa"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "4bed7d66b47bf496a6056617d3537581",
"packagetype": "bdist_wheel",
"python_version": "cp313",
"requires_python": ">=3.9",
"size": 1326171,
"upload_time": "2025-10-27T16:38:45",
"upload_time_iso_8601": "2025-10-27T16:38:45.747890Z",
"url": "https://files.pythonhosted.org/packages/66/90/69c33f2e9c2577f5f46f3684de017abb2439b89ee47d0bc2e2b70466ca86/log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "1e0b6dce3ec6af13b85904bce664e934a5f8d68567f1711cf1a25a20fe444abb",
"md5": "2e4b37c1e8b78075d30523977aa9a218",
"sha256": "316d8b74384761b65d5325c7abe79609f25fc78624f52824185f543bc7175367"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"has_sig": false,
"md5_digest": "2e4b37c1e8b78075d30523977aa9a218",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 339348,
"upload_time": "2025-10-27T16:38:46",
"upload_time_iso_8601": "2025-10-27T16:38:46.957821Z",
"url": "https://files.pythonhosted.org/packages/1e/0b/6dce3ec6af13b85904bce664e934a5f8d68567f1711cf1a25a20fe444abb/log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "4d3dff25e2768a894103983450b00e9674b78359d71e8f44671e35da1c84277b",
"md5": "5da813ff7e7c12dccaf6089c839a09b6",
"sha256": "5ae6c133fe4e2e7ece46431c360eaaaa14b5a8c093b978fb6a0c14df253f4b8b"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl",
"has_sig": false,
"md5_digest": "5da813ff7e7c12dccaf6089c839a09b6",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 367503,
"upload_time": "2025-10-27T16:38:47",
"upload_time_iso_8601": "2025-10-27T16:38:47.969148Z",
"url": "https://files.pythonhosted.org/packages/4d/3d/ff25e2768a894103983450b00e9674b78359d71e8f44671e35da1c84277b/log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "d9a7bad78b0e1356d3219e7b0a1dd401fe08ad6dc92644550ffc13c9bb428d1b",
"md5": "ce494ff4781f1187838ede47a128e5fb",
"sha256": "7474c0c4e65f7b277cebb9dafd82f82d3bc1bb9aff46f57e0fb86204e2d58e33"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"has_sig": false,
"md5_digest": "ce494ff4781f1187838ede47a128e5fb",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 351108,
"upload_time": "2025-10-27T16:38:49",
"upload_time_iso_8601": "2025-10-27T16:38:49.017187Z",
"url": "https://files.pythonhosted.org/packages/d9/a7/bad78b0e1356d3219e7b0a1dd401fe08ad6dc92644550ffc13c9bb428d1b/log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "20d7f19e12bf8ba83274171dc3d785a50c3461287180d87ccdaca01952f5cd0f",
"md5": "375ef90bf5279a0abbd8433d129a8de5",
"sha256": "e0e13f635da79b02ecdc298840cb7a5b0e825854f4c03c3a33e8d2fa29985078"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_aarch64.whl",
"has_sig": false,
"md5_digest": "375ef90bf5279a0abbd8433d129a8de5",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 1265907,
"upload_time": "2025-10-27T16:38:50",
"upload_time_iso_8601": "2025-10-27T16:38:50.136066Z",
"url": "https://files.pythonhosted.org/packages/20/d7/f19e12bf8ba83274171dc3d785a50c3461287180d87ccdaca01952f5cd0f/log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_aarch64.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "c03fd81ea6fed39c2beae484a3a63bb27e1bacd2854d6d78623e856033306a16",
"md5": "bb69446610c7603153a64c1fa3e10785",
"sha256": "0164e56dbaccad540f1cd3dd035d5853f8fbd76e6b186c3b501a85f1cfda6c3b"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_i686.whl",
"has_sig": false,
"md5_digest": "bb69446610c7603153a64c1fa3e10785",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 1431117,
"upload_time": "2025-10-27T16:38:51",
"upload_time_iso_8601": "2025-10-27T16:38:51.426058Z",
"url": "https://files.pythonhosted.org/packages/c0/3f/d81ea6fed39c2beae484a3a63bb27e1bacd2854d6d78623e856033306a16/log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_i686.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "11ecfc8446edcff6399b8e099db9ea78ff80d27127dae952bf7d13b5e4a20c35",
"md5": "5f797b1cdd12bcbbc946ed41b6782074",
"sha256": "d0956dfa579acbd51810bcd6aadbe7c5eb6548a70645cdcdd97951670d667a27"
},
"downloads": -1,
"filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_x86_64.whl",
"has_sig": false,
"md5_digest": "5f797b1cdd12bcbbc946ed41b6782074",
"packagetype": "bdist_wheel",
"python_version": "cp39",
"requires_python": ">=3.9",
"size": 1326057,
"upload_time": "2025-10-27T16:38:53",
"upload_time_iso_8601": "2025-10-27T16:38:53.007390Z",
"url": "https://files.pythonhosted.org/packages/11/ec/fc8446edcff6399b8e099db9ea78ff80d27127dae952bf7d13b5e4a20c35/log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_x86_64.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-27 16:38:10",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "y-scope",
"github_project": "log-surgeon-ffi-py",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "log-surgeon-ffi"
}