log-surgeon-ffi

Name	log-surgeon-ffi JSON
Version	0.1.0b4 JSON
	download
home_page	None
Summary	Python FFI bindings for log-surgeon: high-performance parsing of unstructured logs into structured data
upload_time	2025-10-27 16:38:10
maintainer	None
docs_url	None
author	None
requires_python	>=3.9
license	Apache License 2.0
keywords	logging log-parsing log-analysis structured-data performance observability
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # `log-surgeon-ffi`

`log-surgeon-ffi` provides Python foreign function interface (FFI) bindings for
[`log-surgeon`](https://github.com/y-scope/log-surgeon).

---

## Quick navigation

[**Overview**](#overview)
* [Why `log-surgeon`?](#why-log-surgeon)
* [Key capabilities](#key-capabilities)
* [Structured output and downstream capabilities](#structured-output-and-downstream-capabilities)
* [When to use `log-surgeon`](#when-to-use-log-surgeon)

[**Getting started**](#getting-started)
* [System requirements](#system-requirements)
* [Installation](#installation)
* [First steps](#first-steps)
* [Important prerequisites](#important-prerequisites)
* [Quick start examples](#quick-start-examples)

[**Key concepts**](#key-concepts)
* [Token-based parsing and delimiters](#token-based-parsing-and-delimiters)
* [Named capture groups](#named-capture-groups)
* [Using raw f-strings for regex patterns](#using-raw-f-strings-for-regex-patterns)

[**Reference**](#reference)
* [Parser API](#parser)
* [Query API](#query)
* [PATTERN constants](#pattern)

[**Development**](#development)
* [Building from source](#building-from-source)
* [Running tests](#running-tests)

---

## Overview

[`log-surgeon`](https://github.com/y-scope/log-surgeon), is a high-performance C++ library that
enables efficient extraction of structured information from unstructured log files.

### Why `log-surgeon`?

Traditional regex engines are often slow to execute, prone to errors, and costly to maintain. For
example, Meta uses RE2 (a state-of-the-art regex engine) to parse logs, but they still face
scalability and maintenance challenges, which limits extraction to a small set of fields such as
timestamps, levels, and component names.

`log-surgeon` streamlines the process by identifying, extracting, and labeling variable values with
semantic context, and then inferring a log template in a single pass. `log-surgeon` is also built to
accommodate structural variability. Values may shift position, appear multiple times, or change order
entirely, but with `log-surgeon`, you simply define the variable patterns, and `log-surgeon`
JIT-compiles a tagged-DFA state machine to drive the full pipeline.

### Key capabilities

* **Extract variables** from log messages using regex patterns with named capture groups
* **Generate log types** (templates) automatically for log analysis
* **Parse streams** efficiently for large-scale log processing
* **Export data** to pandas DataFrames and PyArrow Tables

### Structured output and downstream capabilities

Unstructured log data is automatically transformed into structured semantic representations.

* **Log types (templates)**: Variables are replaced with placeholders to form reusable templates.
  For example, roughly 200,000 Spark log messages can reduce to about 55 distinct templates, which
  supports pattern analysis and anomaly detection.

* **Semantic Variables**: Extracted key-value pairs with semantic context (e.g., `app_id`,
  `app_name`, `worker_id`) can be used directly for analysis.

This structured output unlocks powerful downstream capabilities:

* **Knowledge graph construction.** Build relationship graphs between entities extracted from logs
  (e.g., linking `app_id` → `app_name` → `worker_id`). The structured output fits tools such as
  [Stitch](https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhao), which
  uses flow reconstruction from logs to perform non-intrusive performance profiling and debugging
  across distributed systems.

* **Template-based summarization.** Compress massive datasets into compact template sets for human
  and agent consumption. Templates act as natural tokens for LLMs. Instead of millions of raw lines,
  provide a small number of distinct templates with statistics.

* **Hybrid search** Combine free-text search with structured queries. Log types enable
  auto-completion and query suggestions on large datasets. Instead of searching through millions of
  raw log lines, search across a compact set of templates first. Then project and filter on
  structured variables (e.g., `status == "ERROR"`, `response_time > 1000`), and aggregate for
  analysis.

* **Agentic automation.** Agents can query by template, analyze variable distributions, identify
  anomalies, and automate debugging tasks using structured signals rather than raw text.

### When to use `log-surgeon`

**Good fit**
* Large-scale log processing (millions of lines)
* Extracting structured data from semi-structured logs
* Generating log templates for analytics
* Multi-line log events (stack traces, JSON dumps)
* Performance-critical parsing

**Not ideal**
* Simple one-off text extraction (use Python `re` module)
* Highly irregular text without consistent delimiters
* Patterns requiring full PCRE features (lookahead, backreferences)

---

## Getting started

Follow the instructions below to get started with `log-surgeon-ffi`.

### System requirements

- Python >= 3.9
- pandas
- pyarrow

#### Build requirements

- C++20 compatible compiler
- CMake >= 3.15

### Installation

To install the library with pandas and PyArrow support for DataFrame/Arrow table exports, run the
following command:

```bash
pip install log-surgeon-ffi
```

To verify your installation, run the following command:

```bash
python -c "from log_surgeon import Parser; print('Installation successful.')"
```

**Note:** If you only need core parsing without DataFrame or Arrow exports, you can install a
minimal environment, although pandas and PyArrow are included by default for convenience.

### First steps

After installation, follow these steps:

1. **Read [Key Concepts](#key-concepts).** Token based parsing differs from traditional regex.
2. **Run a [Quick start example](#quick-start-examples)** to see how it works.
3. **Use `rf"..."` for patterns** to avoid escaping issues. See
   [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns).
4. **Check out [examples/](examples/)** to study some complete working examples.

---

> ### Important prerequisites
> 
> `log-surgeon` uses token-based parsing, and its regex behavior differs from traditional engines.
> Read the [Key Concepts](#key-concepts) section before writing patterns.
> 
> Critical differences between token-based parsing and traditional regex behavior:
> 
> * `.*` only matches within a single token (not across delimiters)
> * `abc|def` requires grouping: use `(abc)|(def)` instead
> * Use `{0,1}` for optional patterns, NOT `?`
> 
> **Tip:** Use raw f-strings (`rf"..."`) for regex patterns. See 
> [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for more details.

---

### Quick start examples

Use the following examples to get started.

#### Basic parsing

The following code parses a simple log event with `log-surgeon`.

```python
from log_surgeon import Parser, PATTERN

# Parse a sample log event
log_line = "16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\n"

# Create a parser and define extraction patterns
parser = Parser()
parser.add_var("resource", rf"(?<memory_gb>{PATTERN.FLOAT}) GiB ram")
parser.compile()

# Parse a single event
event = parser.parse_event(log_line)

# Access extracted data
print(f"Message: {event.get_log_message().strip()}")
print(f"LogType: {event.get_log_type().strip()}")
print(f"Parsed Logs: {event}")
```

**Output:**
```
Message: 16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram
LogType: 16/05/04 04:24:58 INFO Registering worker with 1 core and <memory_gb> GiB ram
Parsed Logs: {
  "memory_gb": "4.0"
}
```

We can see that the parser extracted structured data from the unstructured log line:
* ***Message**: The original log line
* **LogType**: Template with variable placeholder `<memory_gb>` showing the pattern structure
* **Parsed variables**: Successfully extracted `memory_gb` value of "4.0" from the pattern match

#### Try it yourself

Copy this code and modify the pattern to extract both `memory_gb` AND `cores`:

```python
from log_surgeon import Parser, PATTERN

log_line = "16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\n"
parser = Parser()
# TODO: Add pattern to capture both "1" (cores) and "4.0" (memory_gb)
parser.add_var("resource", rf"...")
parser.compile()

event = parser.parse_event(log_line)
print(f"Cores: {event['cores']}, Memory: {event['memory_gb']}")
```

<details>
<summary>Solution</summary>

```python
parser.add_var("resource", rf"(?<cores>\d+) core and (?<memory_gb>{PATTERN.FLOAT}) GiB ram")
```
</details>

---

#### Multiple capture groups

The following code parses a more-complex log event.

```python
from log_surgeon import Parser, PATTERN

# Parse a sample log event
log_line = """16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:750)
"""

# Create a parser and define extraction patterns
parser = Parser()

# Add timestamp pattern
parser.add_timestamp("TIMESTAMP_SPARK_1_6", rf"\d{{2}}/\d{{2}}/\d{{2}} \d{{2}}:\d{{2}}:\d{{2}}")

# Add variable patterns
parser.add_var("SYSTEM_LEVEL", rf"(?<level>(INFO)|(WARN)|(ERROR))")
parser.add_var("SPARK_HOST_IP_PORT", rf"(?<spark_host>spark\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})")
parser.add_var(
  "SYSTEM_EXCEPTION",
  rf"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): "
  rf"(?<system_exception_msg>{PATTERN.LOG_LINE})"
)
parser.add_var(
  rf"SYSTEM_STACK_TRACE",
  rf"(\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})"
)
parser.compile()

# Parse a single event
event = parser.parse_event(log_line)

# Access extracted data
print(f"Message: {event.get_log_message().strip()}")
print(f"LogType: {event.get_log_type().strip()}")
print(f"Parsed Logs: {event}")
```

**Output:**
```
Message: 16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:750)
LogType: <timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>
<system_exception_type>: <system_exception_msg><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>
Parsed Logs: {
  "timestamp": "16/05/04 12:22:37",
  "level": "WARN",
  "spark_host": "spark-35",
  "system_ip": "192.168.10.50",
  "system_port": "55392",
  "system_exception_type": "java.io.IOException",
  "system_exception_msg": "Connection reset by peer",
  "system_stack": [
    "sun.nio.ch.FileDispatcherImpl.read0(Native Method)",
    "sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)",
    "sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)",
    "sun.nio.ch.IOUtil.read(IOUtil.java:192)",
    "sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)",
    "io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)",
    "io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)",
    "io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)",
    "io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)",
    "io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)",
    "io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)",
    "io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)",
    "io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)",
    "io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)",
    "java.lang.Thread.run(Thread.java:750)"
  ]
}
```

The parser extracted **multiple named capture groups** from a complex multi-line Java stack trace:
* **Scalar fields**: `timestamp`, `level`, `spark_host`, `system_ip`, `system_port`,
  `system_exception_type`, `system_exception_msg`
* **Array field**: `system_stack` contains all 15 stack trace locations (demonstrates automatic
  aggregation of repeated capture groups)
* **LogType**: Template shows the structure with `<newLine>` markers indicating line boundaries in
  the original log

---

#### Stream parsing

When parsing log streams or files, timestamps are **required** to perform contextual anchoring.
Timestamps act as delimiters that separate individual log events, enabling the parser to correctly
group multi-line entries (like stack traces) into single events.

```python
from log_surgeon import Parser, PATTERN

# Parse from string (automatically converted to io.StringIO)
SAMPLE_LOGS = """16/05/04 04:31:13 INFO master.Master: Registering app SparkSQL::192.168.10.76
16/05/04 12:32:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:750)
16/05/04 04:37:53 INFO master.Master: 192.168.10.76:41747 got disassociated, removing it.
"""

# Define parser with patterns
parser = Parser()
# REQUIRED: Timestamp acts as contextual anchor to separate individual log events in the stream
parser.add_timestamp("TIMESTAMP_SPARK_1_6", rf"\d{{2}}/\d{{2}}/\d{{2}} \d{{2}}:\d{{2}}:\d{{2}}")
parser.add_var("SYSTEM_LEVEL", rf"(?<level>(INFO)|(WARN)|(ERROR))")
parser.add_var("SPARK_APP_NAME", rf"(?<spark_app_name>SparkSQL::{PATTERN.IPV4})")
parser.add_var("SPARK_HOST_IP_PORT", rf"(?<spark_host>spark\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})")
parser.add_var(
    "SYSTEM_EXCEPTION",
    rf"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): "
    rf"(?<system_exception_msg>{PATTERN.LOG_LINE})"
)
parser.add_var(
    rf"SYSTEM_STACK_TRACE", rf"(\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})"
)
parser.add_var("IP_PORT", rf"(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})")
parser.compile()

# Stream parsing: iterate over multi-line log events
for idx, event in enumerate(parser.parse(SAMPLE_LOGS)):
    print(f"log-event-{idx} log template type:{event.get_log_type().strip()}")
```

**Output:**
```
log-event-0 log template type:<timestamp> <level> master.Master: Registering app <spark_app_name>
log-event-1 log template type:<timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>
<system_exception_type>: <system_exception_msg><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack>
log-event-2 log template type:<timestamp> <level> master.Master: <system_ip>:<system_port> got disassociated, removing it.<newLine>
```

The parser successfully separated the log stream into **three distinct events** using timestamps as
contextual anchors:
* **Event 0**: Single-line app registration log
* **Event 1**: Multi-line exception with 15 stack trace lines (demonstrates how timestamps bind
  multi-line events together)
* **Event 2**: Single-line disassociation log

Each log type shows the template structure with variable placeholders (`<level>`, `<system_ip>`,
etc.), enabling pattern-based log analysis and grouping.

---

#### Using `PATTERN` constants

The `PATTERN` class provides pre-built regex patterns for common log elements like IP addresses,
UUIDs, numbers, and file paths. See the [PATTERN reference](#pattern) for the complete list of
available patterns.

```python
from log_surgeon import Parser, PATTERN

parser = Parser()
parser.add_var("network", rf"IP: (?<ip>{PATTERN.IPV4}) UUID: (?<id>{PATTERN.UUID})")
parser.add_var("metrics", rf"value=(?<value>{PATTERN.FLOAT})")
parser.compile()

log_line = "IP: 192.168.1.1 UUID: 550e8400-e29b-41d4-a716-446655440000 value=42.5"
event = parser.parse_event(log_line)

print(f"IP: {event['ip']}")
print(f"UUID: {event['id']}")
print(f"Value: {event['value']}")
```

**Output:**
```
IP: 192.168.1.1
UUID: 550e8400-e29b-41d4-a716-446655440000
Value: 42.5
```

---

#### Export to DataFrame

```python
from log_surgeon import Parser, Query

parser = Parser()
parser.add_var(
  "metric",
  rf"metric=(?<metric_name>\w+) value=(?<value>\d+)"
)
parser.compile()

log_data = """
2024-01-01 INFO: metric=cpu value=42
2024-01-01 INFO: metric=memory value=100
2024-01-01 INFO: metric=disk value=7
"""

# Create a query and export to DataFrame
query = (
  Query(parser)
  .select(["metric_name", "value"])
  .from_(log_data)
  .validate_query()
)

df = query.to_dataframe()
print(df)
```

---

#### Filtering events

```python
from log_surgeon import Parser, Query

parser = Parser()
parser.add_var("metric", rf"metric=(?<metric_name>\w+) value=(?<value>\d+)")
parser.compile()

log_data = """
2024-01-01 INFO: metric=cpu value=42
2024-01-01 INFO: metric=memory value=100
2024-01-01 INFO: metric=disk value=7
2024-01-01 INFO: metric=cpu value=85
"""

# Filter events where value > 50
query = (
  Query(parser)
  .select(["metric_name", "value"])
  .from_(log_data)
  .filter(lambda event: int(event['value']) > 50)
  .validate_query()
)

df = query.to_dataframe()
print(df)
# Output:
#   metric_name  value
# 0      memory    100
# 1         cpu     85
```

---

#### Including log template type and log message

Use special fields `@log_type` and `@log_message` to include alongside extracted variables:

```python
from log_surgeon import Parser, Query

parser = Parser()
parser.add_var("metric", rf"value=(?<value>\d+)")
parser.compile()

log_data = """
2024-01-01 INFO: Processing value=42
2024-01-01 WARN: Processing value=100
"""

# Select log type, message, and all variables
query = (
  Query(parser)
  .select(["@log_type", "@log_message", "*"])
  .from_(log_data)
  .validate_query()
)

df = query.to_dataframe()
print(df)
# Output:
#                          @log_type                         @log_message value
# 0  <timestamp> INFO: Processing <metric>  2024-01-01 INFO: Processing value=42    42
# 1  <timestamp> WARN: Processing <metric>  2024-01-01 WARN: Processing value=100  100
```

The `"*"` wildcard expands to all variables defined in the schema and can be combined with other fields like `@log_type` and `@log_message`.

---

#### Analyzing Log Types

Discover and analyze log patterns in your data using log type analysis methods:

```python
from log_surgeon import Parser, Query

parser = Parser()
parser.add_var("metric", rf"value=(?<value>\d+)")
parser.add_var("status", rf"status=(?<status>\w+)")
parser.compile()

log_data = """
2024-01-01 INFO: Processing value=42
2024-01-01 INFO: Processing value=100
2024-01-01 WARN: System status=degraded
2024-01-01 INFO: Processing value=7
2024-01-01 ERROR: System status=failed
"""

query = Query(parser).from_(log_data)

# Get all unique log types
print("Unique log types:")
for log_type in query.get_log_types():
  print(f"  {log_type}")

# Reset stream for next analysis
query.from_(log_data)

# Get log type occurrence counts
print("\nLog type counts:")
counts = query.get_log_type_counts()
for log_type, count in sorted(counts.items(), key=lambda x: -x[1]):
  print(f"  {count:3d}  {log_type}")

# Reset stream for next analysis
query.from_(log_data)

# Get sample messages for each log type
print("\nLog type samples:")
samples = query.get_log_type_with_sample(sample_size=2)
for log_type, messages in samples.items():
  print(f"  {log_type}")
  for msg in messages:
    print(f"    - {msg.strip()}")
```

**Output:**
```
Unique log types:
  <timestamp> INFO: Processing <metric>
  <timestamp> WARN: System <status>
  <timestamp> ERROR: System <status>

Log type counts:
    3  <timestamp> INFO: Processing <metric>
    1  <timestamp> WARN: System <status>
    1  <timestamp> ERROR: System <status>

Log type samples:
  <timestamp> INFO: Processing <metric>
    - 2024-01-01 INFO: Processing value=42
    - 2024-01-01 INFO: Processing value=100
  <timestamp> WARN: System <status>
    - 2024-01-01 WARN: System status=degraded
  <timestamp> ERROR: System <status>
    - 2024-01-01 ERROR: System status=failed
```

---

## Key concepts

> **CRITICAL: You must understand these concepts to use `log-surgeon` correctly.**
>
> `log-surgeon` works **fundamentally differently** from traditional regex engines like Python's
> `re` module, PCRE, or JavaScript regex. Skipping this section may lead to patterns that don't
> work as expected.

### Token-based parsing and delimiters

**CRITICAL:** `log-surgeon` uses **token-based** parsing, not character-based regex matching like
traditional regex engines. This is the most important difference that affects how patterns work.

#### How tokenization works

Delimiters are characters used to split log messages into tokens. The default delimiters include:
- Whitespace: space, tab (`\t`), newline (`\n`), carriage return (`\r`)
- Punctuation: `:`, `,`, `!`, `;`, `%`, `@`, `/`, `(`, `)`, `[`, `]`

For example, with default delimiters, the log message:
```
"abc def ghi"
```
is tokenized into three tokens: `["abc", "def", "ghi"]`

You can customize delimiters when creating a Parser:

```python
parser = Parser(delimiters=r" \t\n,:")  # Custom delimiters
```

#### Token-Based Pattern Matching

**Critical:** Patterns like `.*` only match **within a single token**, not across multiple tokens or delimiters.

```python
from log_surgeon import Parser

parser = Parser()  # Default delimiters include space
parser.add_var("token", rf"(?<match>d.*)")
parser.compile()

# With "abc def ghi" tokenized as ["abc", "def", "ghi"]
event = parser.parse_event("abc def ghi")

# Matches only "def" (single token starting with 'd')
# Does NOT match "def ghi" (would cross token boundary)
print(event['match'])  # Output: "def"
```

**In a traditional regex engine**, `d.*` would match `"def ghi"` (everything from 'd' to end).
**In log-surgeon**, `d.*` matches only `"def"` because patterns cannot cross delimiter boundaries.

#### Why token-based?

Token-based parsing enables:
- **Faster parsing** by reducing search space
- **Predictable behavior** aligned with log structure
- **Efficient log type generation** for analytics

#### Working with token boundaries

To match across multiple tokens, you must use **character classes** like `[a-zA-Z]*` instead of `.`:

```python
from log_surgeon import Parser

parser = Parser()  # Default delimiters include space

#  Using .* - only matches within a single token
parser.add_var("wrong", rf"(?<match>d.*)")  # Matches only "def"

#  Using character classes - matches across tokens
parser.add_var("correct", rf"(?<match>d[a-z ]*i)")  # Matches "def ghi"
parser.compile()

event = parser.parse_event("abc def ghi")
print(event['match'])  # Output: "def ghi"
```

**Key Rule:** Character classes like `[a-zA-Z]*`, `[a-z ]*`, or `[\w\s]*` can match across token
boundaries, but `.*` cannot.

#### Alternation requires grouping

**CRITICAL:** Alternation (`|`) works differently in log-surgeon compared to traditional regex
engines. You **must** use parentheses to group alternatives.

```python
from log_surgeon import Parser

parser = Parser()

#  WRONG: Without grouping - matches "ab" AND ("c" OR "d") AND "ef"
parser.add_var("wrong", rf"(?<word>abc|def)")
# In log-surgeon, this is interpreted as: "ab" + "c|d" + "ef"
# Matches: "abcef" or "abdef" (NOT "abc" or "def")

#  CORRECT: With grouping - matches "abc" OR "def"
parser.add_var("correct", rf"(?<word>(abc)|(def))")
# Matches: "abc" or "def"
parser.compile()
```

**In traditional regex engines**, `abc|def` means "abc" OR "def".
**In log-surgeon**, `abc|def` means "ab" + ("c" OR "d") + "ef".

**Key Rule:** Always use `(abc)|(def)` syntax for alternation to match complete alternatives.

```python
# More examples:
parser.add_var("level", rf"(?<level>(ERROR)|(WARN)|(INFO))")  #  Correct
parser.add_var("status", rf"(?<status>(success)|(failure))")  #  Correct
parser.add_var("bad", rf"(?<status>success|failure)")         #  Wrong - unexpected behavior
```

#### Optional patterns

For optional patterns, use `{0,1}` instead of `*`:

```python
from log_surgeon import Parser

parser = Parser()

#  Avoid using * for optional patterns (matches 0 or more)
parser.add_var("avoid", rf"(?<level>(ERROR)|(WARN))*")  # Can match empty string or multiple reps

#  Do not use ? for optional patterns
parser.add_var("avoid2", rf"(?<level>(ERROR)|(WARN))?")  # May not work as expected

#  Use {0,1} for optional patterns (matches 0 or 1)
parser.add_var("optional", rf"(?<level>(ERROR)|(WARN)){0,1}")  # Matches 0 or 1 occurrence
parser.compile()
```

**Best practice:** Use `{0,1}` for optional elements. Avoid `*` (0 or more) and `?` for optional
matching.

You can also explicitly include delimiters in your pattern:

```python
# To match "def ghi", explicitly include the space delimiter
parser.add_var("multi", rf"(?<match>d\w+\s+\w+)")
# This matches "def " as one token segment, followed by "ghi"
```

Or adjust your delimiters to change tokenization behavior:

```python
# Use only newline as delimiter to treat entire lines as tokens
parser = Parser(delimiters=r"\n")
```

### Named capture groups

Use named capture groups in regex patterns to extract specific fields:

```python
parser.add_var("metric", rf"metric=(?<metric_name>\w+) value=(?<value>\d+)")
```

The syntax `(?<name>pattern)` creates a capture group that can be accessed as `event['name']`.

**Note:** See [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for best practices on
writing regex patterns.

### Using raw f-strings for regex patterns

> **⚠️ STRONGLY RECOMMENDED: Use raw f-strings (`rf"..."`) for all regex patterns.**
>
> While not absolutely required, using regular strings will likely cause escaping issues and pattern
failures. Raw f-strings prevent these problems.

Raw f-strings combine the benefits of:
- **Raw strings (`r"..."`)**: No need to double-escape regex special characters like `\d`, `\w`,
  `\n`
- **f-strings (`f"..."`)**: Easy interpolation of variables and pattern constants

#### Why use raw f-strings?

```python
#  Without raw strings - requires double-escaping
parser.add_var("metric", "value=(\\d+)")  # Hard to read, error-prone

#  With raw f-strings - single escaping, clean and readable
parser.add_var("metric", rf"value=(?<value>\d+)")
```

#### Watch out for braces in f-strings

When using f-strings, literal `{` and `}` characters must be escaped by doubling them:

```python
from log_surgeon import Parser, Pattern

parser = Parser()

#  Correct: Escape literal braces in regex
parser.add_var("json", rf"data={{(?<content>[^}}]+)}}")  # Matches: data={...}
parser.add_var("range", rf"range={{(?<min>\d+),(?<max>\d+)}}")  # Matches: range={10,20}

#  Using Pattern constants with interpolation
parser.add_var("ip", rf"IP: (?<ip>{Pattern.IPV4})")
parser.add_var("float", rf"value=(?<val>{Pattern.FLOAT})")

#  Common regex patterns
parser.add_var("digits", rf"\d+ items")  # No double-escaping needed
parser.add_var("word", rf"name=(?<name>\w+)")
parser.add_var("whitespace", rf"split\s+by\s+spaces")

parser.compile()
```

#### Examples: raw f-strings vs regular strings

```python
# Regular string - requires double-escaping
parser.add_var("path", "path=(?<path>\\w+/\\w+)")  # Hard to read

# Raw f-string - natural regex syntax
parser.add_var("path", rf"path=(?<path>\w+/\w+)")  # Clean and readable

# With interpolation
log_level = "INFO|WARN|ERROR"
parser.add_var("level", rf"(?<level>{log_level})")  # Easy to compose
```

**Recommendation:** Consistently use `rf"..."` for all regex patterns. This approach:
- Avoids double-escaping mistakes that break patterns
- Makes patterns more readable
- Allows easy use of Pattern constants and variables
- Only requires watching for literal braces `{` and `}` in f-strings (escape as `{{` and `}}`)

Using regular strings (`"..."`) will require double-escaping (e.g., `"\\d+"`) which is error-prone
and can be hard to read.

### Logical vs. physical names

Internally, log-surgeon uses "physical" names (e.g., `CGPrefix0`, `CGPrefix1`) for capture groups,
while you work with "logical" names (e.g., `user_id`, `thread`). The `GroupNameResolver` handles
this mapping automatically.

### Schema Format

The schema defines delimiters, timestamps, and variables for parsing:

```
// schema delimiters
delimiters: \t\r\n:,!;%@/\(\)\[\]

// schema timestamps
timestamp:<timestamp_regex>

// schema variables
variable_name:<variable_regex>
```

When using the fluent API (`Parser.add_var()` and `Parser.compile()`), the schema is built automatically.

### Common Pitfalls

 **Pattern doesn't match anything**
- Check: Are you using `.*` to match across tokens? Use `[a-zA-Z ]*` instead
- Check: Did you forget to call `parser.compile()`?
- Check: Are your delimiters splitting tokens unexpectedly?

 **Alternation not working (abc|def)**
- Problem: `(?<name>abc|def)` doesn't match "abc" or "def" as expected
- Solution: Use `(?<name>(abc)|(def))` with explicit grouping

 **Pattern works in regex tester but not here**
- Remember: log-surgeon is token-based, not character-based
- Traditional regex engines match across entire strings
- log-surgeon matches within token boundaries (delimited by spaces, colons, etc.)
- Read: [Token-Based Parsing](#token-based-parsing-and-delimiters)

 **Escape sequence errors in Python**
- Problem: `parser.add_var("digits", "(?<num>\d+)")` raises SyntaxError
- Solution: Use `rf"..."` (raw f-string) instead of `"..."` or `f"..."`
- Example: `parser.add_var("digits", rf"(?<num>\d+)")`

 **Optional pattern matching incorrectly**
- Problem: Using `?` or `*` for optional patterns
- Solution: Use `{0,1}` for optional elements
- Example: `(?<level>(ERROR)|(WARN)){0,1}` for optional log level

---

## Reference

| Task | Syntax |
|------|--------|
| Named capture | `(?<name>pattern)` |
| Alternation | `(?<name>(opt1)|(opt2))` (NOT `opt1|opt2`) |
| Optional | `{0,1}` (NOT `?` or `*`) |
| Match across tokens | Use `[a-z ]*` (NOT `.*`) |
| Pattern string | `rf"..."` (raw f-string recommended) |
| All variables | `.select(["*"])` |
| Log type | `.select(["@log_type"])` |
| Original message | `.select(["@log_message"])` |

### Parser

High-level parser for extracting structured data from unstructured log messages.

#### Constructor

- `Parser(delimiters: str = r" \t\r\n:,!;%@/\(\)\[\]")`
  - Initialize a parser with optional custom delimiters
  - Default delimiters include space, tab, newline, and common punctuation

#### Methods

- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> Parser`
  - Add a variable pattern to the parser's schema
  - Supports named capture groups using `(?<name>)` syntax
  - Use raw f-strings (`rf"..."`) for regex patterns (see [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns))
  - Returns self for method chaining

- `add_timestamp(name: str, regex: str) -> Parser`
  - Add a timestamp pattern to the parser's schema
  - Returns self for method chaining

- `compile(enable_debug_logs: bool = False) -> None`
  - Build and initialize the parser with the configured schema
  - Must be called after adding variables and before parsing
  - Set `enable_debug_logs=True` to output debug information to stderr

- `load_schema(schema: str, group_name_resolver: GroupNameResolver) -> None`
  - Load a pre-built schema string to configure the parser

- `parse(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Generator[LogEvent, None, None]`
  - Parse all log events from a string, file object, or stream
  - Accepts strings, text/binary file objects, StringIO, or BytesIO
  - Yields LogEvent objects for each parsed event

- `parse_event(payload: str) -> LogEvent | None`
  - Parse a single log event from a string (convenience method)
  - Wraps `parse()` and returns the first event
  - Returns LogEvent or None if no event found

### LogEvent

Represents a parsed log event with extracted variables.

#### Methods

- `get_log_message() -> str`
  - Get the original log message

- `get_log_type() -> str`
  - Get the generated log type (template) with logical group names

- `get_capture_group(logical_capture_group_name: str, raw_output: bool = False) -> str | list | None`
  - Get the value of a capture group by its logical name
  - If `raw_output=False` (default), single values are unwrapped from lists
  - Returns None if capture group not found

- `get_capture_group_str_representation(field: str, raw_output: bool = False) -> str`
  - Get the string representation of a capture group value

- `get_resolved_dict() -> dict[str, str | list]`
  - Get a dictionary with all capture groups using logical (user-defined) names
  - Physical names (CGPrefix*) are converted to logical names
  - Timestamp fields are consolidated under "timestamp" key
  - Single-value lists are unwrapped to scalar values
  - "@LogType" is excluded from the output

- `__getitem__(key: str) -> str | list`
  - Access capture group values by name (e.g., `event['field_name']`)
  - Shorthand for `get_capture_group(key, raw_output=False)`

- `__str__() -> str`
  - Get formatted JSON representation of the log event with logical group names
  - Uses `get_resolved_dict()` internally

### Query

Query builder for parsing log events into structured data formats.

#### Constructor

- `Query(parser: Parser)`
  - Initialize a query with a configured parser

#### Methods

- `select(fields: list[str]) -> Query`
  - Select fields to extract from log events
  - Supports variable names, `"*"` for all variables, `"@log_type"` for log type, and `"@log_message"` for original message
  - The `"*"` wildcard can be combined with other fields (e.g., `["@log_type", "*"]`)
  - Returns self for method chaining

- `filter(predicate: Callable[[LogEvent], bool]) -> Query`
  - Filter log events using a predicate function
  - Predicate receives a LogEvent and returns True to include it, False to exclude
  - Returns self for method chaining
  - Example: `query.filter(lambda event: int(event['value']) > 50)`

- `from_(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`
  - Set the input source to parse
  - Accepts strings, text/binary file objects, StringIO, or BytesIO
  - Strings are automatically wrapped in StringIO
  - Returns self for method chaining

- `select_from(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`
  - Alias for `from_()`
  - Returns self for method chaining

- `validate_query() -> Query`
  - Validate that the query is properly configured
  - Returns self for method chaining

- `to_dataframe() -> pd.DataFrame`
  - Convert parsed events to a pandas DataFrame

- `to_df() -> pd.DataFrame`
  - Alias for `to_dataframe()`

- `to_arrow() -> pa.Table`
  - Convert parsed events to a PyArrow Table

- `to_pa() -> pa.Table`
  - Alias for `to_arrow()`

- `get_rows() -> list[list]`
  - Extract rows of field values from parsed events

- `get_vars() -> KeysView[str]`
  - Get all variable names (logical capture group names) defined in the schema

- `get_log_types() -> Generator[str, None, None]`
  - Get all unique log types from parsed events
  - Yields log types in the order they are first encountered
  - Useful for discovering log patterns in your data

- `get_log_type_counts() -> dict[str, int]`
  - Get count of occurrences for each unique log type
  - Returns dictionary mapping log types to their counts
  - Useful for analyzing log type distribution

- `get_log_type_with_sample(sample_size: int = 3) -> dict[str, list[str]]`
  - Get sample log messages for each unique log type
  - Returns dictionary mapping log types to lists of sample messages
  - Useful for understanding what actual messages match each template

### SchemaCompiler

Compiler for constructing log-surgeon schema definitions.

#### Constructor

- `SchemaCompiler(delimiters: str = DEFAULT_DELIMITERS)`
  - Initialize a schema compiler with optional custom delimiters

#### Methods

- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> SchemaCompiler`
  - Add a variable pattern to the schema
  - Returns self for method chaining

- `add_timestamp(name: str, regex: str) -> SchemaCompiler`
  - Add a timestamp pattern to the schema
  - Returns self for method chaining

- `remove_var(var_name: str) -> SchemaCompiler`
  - Remove a variable from the schema
  - Returns self for method chaining

- `get_var(var_name: str) -> Variable`
  - Get a variable by name

- `compile() -> str`
  - Compile the final schema string

- `get_capture_group_name_resolver() -> GroupNameResolver`
  - Get the resolver for mapping logical to physical capture group names

### GroupNameResolver

Bidirectional mapping between logical (user-defined) and physical (auto-generated) group names.

#### Constructor

- `GroupNameResolver(physical_name_prefix: str)`
  - Initialize with a prefix for auto-generated physical names

#### Methods

- `create_new_physical_name(logical_name: str) -> str`
  - Create a new unique physical name for a logical name
  - Each call generates a new physical name

- `get_physical_names(logical_name: str) -> set[str]`
  - Get all physical names associated with a logical name

- `get_logical_name(physical_name: str) -> str`
  - Get the logical name for a physical name

- `get_all_logical_names() -> KeysView[str]`
  - Get all logical names that have been registered

### PATTERN

Collection of pre-built regex patterns optimized for log parsing. These patterns follow log-surgeon's syntax requirements and are ready to use with named capture groups.

#### Available Patterns

**Network Patterns**

| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.UUID` | UUID (Universally Unique Identifier) | `550e8400-e29b-41d4-a716-446655440000` |
| `PATTERN.IP_OCTET` | Single IPv4 octet (0-255) | `192`, `10`, `255` |
| `PATTERN.IPV4` | IPv4 address | `192.168.1.1`, `10.0.0.1` |
| `PATTERN.PORT` | Network port number (1-5 digits) | `80`, `8080`, `65535` |

**Numeric Patterns**

| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.INT` | Integer with optional negative sign | `42`, `-123`, `0` |
| `PATTERN.FLOAT` | Float with optional negative sign | `3.14`, `-123.456`, `0.5` |

**File System Patterns**

| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.LINUX_FILE_NAME_CHARSET` | Character set for Linux file names | `a-zA-Z0-9 ._-` |
| `PATTERN.LINUX_FILE_NAME` | Linux file name | `app.log`, `config-2024.yaml` |
| `PATTERN.LINUX_FILE_PATH` | Linux file path (relative) | `logs/app.log`, `var/log/system.log` |

**Character Sets and Word Patterns**

| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.JAVA_IDENTIFIER_CHARSET` | Java identifier character set | `a-zA-Z0-9_` |
| `PATTERN.JAVA_IDENTIFIER` | Java identifier | `myVariable`, `$value`, `Test123` |
| `PATTERN.LOG_LINE_CHARSET` | Common log line characters | Alphanumeric + symbols + whitespace |
| `PATTERN.LOG_LINE` | General log line content | `Error: connection timeout` |
| `PATTERN.LOG_LINE_NO_WHITE_SPACE_CHARSET` | Log line chars without whitespace | Alphanumeric + symbols only |
| `PATTERN.LOG_LINE_NO_WHITE_SPACE` | Log content without spaces | `ERROR`, `/var/log/app.log` |

**Java-Specific Patterns**

| Pattern | Description | Example Match |
|---------|-------------|---------------|
| `PATTERN.JAVA_LITERAL_CHARSET` | Java literal character set | `a-zA-Z0-9_$` |
| `PATTERN.JAVA_PACKAGE_SEGMENT` | Single Java package segment | `com.`, `example.` |
| `PATTERN.JAVA_CLASS_NAME` | Java class name | `MyClass`, `ArrayList` |
| `PATTERN.JAVA_FULLY_QUALIFIED_CLASS_NAME` | Fully qualified class name | `java.util.ArrayList` |
| `PATTERN.JAVA_LOGGING_CODE_LOCATION_HINT` | Java logging location hint | `~[MyClass.java:42?]` |
| `PATTERN.JAVA_STACK_LOCATION` | Java stack trace location | `java.util.ArrayList.add(ArrayList.java:123)` |

#### Example usage

```python
from log_surgeon import Parser, PATTERN

parser = Parser()

# Network patterns
parser.add_var("network", rf"IP: (?<ip>{PATTERN.IPV4}) Port: (?<port>{PATTERN.PORT})")

# Numeric patterns
parser.add_var("metrics", rf"value=(?<value>{PATTERN.FLOAT}) count=(?<count>{PATTERN.INT})")

# File system patterns
parser.add_var("file", rf"Opening (?<filepath>{PATTERN.LINUX_FILE_PATH})")

# Java patterns
parser.add_var("exception", rf"at (?<stack>{PATTERN.JAVA_STACK_LOCATION})")

parser.compile()
```

#### Composing Patterns

PATTERN constants can be composed to build more complex patterns:

```python
from log_surgeon import Parser, PATTERN

parser = Parser()

# Combine multiple patterns
parser.add_var(
    "server_info",
    rf"Server (?<name>{PATTERN.JAVA_IDENTIFIER}) at (?<ip>{PATTERN.IPV4}):(?<port>{PATTERN.PORT})"
)

# Use character sets to build custom patterns
parser.add_var(
    "custom_id",
    rf"ID-(?<id>[{PATTERN.JAVA_IDENTIFIER_CHARSET}]+)"
)

parser.compile()
```

---

## Development

### Building from source

```bash
# Clone the repository
git clone https://github.com/y-scope/log-surgeon-ffi-py.git
cd log-surgeon-ffi-py

# Install the project in editable mode
pip install -e .

# Build the extension
cmake -S . -B build
cmake --build build
```

### Running tests

```bash
# Install test dependencies
pip install pytest

# Run tests
python -m pytest tests/
```

---

## License

Apache License 2.0 - See [LICENSE](LICENSE) for details.

---

## Links

- [Homepage](https://github.com/y-scope/log-surgeon-ffi-py)
- [Bug Tracker](https://github.com/y-scope/log-surgeon-ffi-py/issues)
- [log-surgeon C++ library](https://github.com/y-scope/log-surgeon)

---

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "log-surgeon-ffi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "logging, log-parsing, log-analysis, structured-data, performance, observability",
    "author": null,
    "author_email": "y-scope <info@yscope.com>",
    "download_url": null,
    "platform": null,
    "description": "# `log-surgeon-ffi`\n\n`log-surgeon-ffi` provides Python foreign function interface (FFI) bindings for\n[`log-surgeon`](https://github.com/y-scope/log-surgeon).\n\n---\n\n## Quick navigation\n\n[**Overview**](#overview)\n* [Why `log-surgeon`?](#why-log-surgeon)\n* [Key capabilities](#key-capabilities)\n* [Structured output and downstream capabilities](#structured-output-and-downstream-capabilities)\n* [When to use `log-surgeon`](#when-to-use-log-surgeon)\n\n[**Getting started**](#getting-started)\n* [System requirements](#system-requirements)\n* [Installation](#installation)\n* [First steps](#first-steps)\n* [Important prerequisites](#important-prerequisites)\n* [Quick start examples](#quick-start-examples)\n\n[**Key concepts**](#key-concepts)\n* [Token-based parsing and delimiters](#token-based-parsing-and-delimiters)\n* [Named capture groups](#named-capture-groups)\n* [Using raw f-strings for regex patterns](#using-raw-f-strings-for-regex-patterns)\n\n[**Reference**](#reference)\n* [Parser API](#parser)\n* [Query API](#query)\n* [PATTERN constants](#pattern)\n\n[**Development**](#development)\n* [Building from source](#building-from-source)\n* [Running tests](#running-tests)\n\n---\n\n## Overview\n\n[`log-surgeon`](https://github.com/y-scope/log-surgeon), is a high-performance C++ library that\nenables efficient extraction of structured information from unstructured log files.\n\n### Why `log-surgeon`?\n\nTraditional regex engines are often slow to execute, prone to errors, and costly to maintain. For\nexample, Meta uses RE2 (a state-of-the-art regex engine) to parse logs, but they still face\nscalability and maintenance challenges, which limits extraction to a small set of fields such as\ntimestamps, levels, and component names.\n\n`log-surgeon` streamlines the process by identifying, extracting, and labeling variable values with\nsemantic context, and then inferring a log template in a single pass. `log-surgeon` is also built to\naccommodate structural variability. Values may shift position, appear multiple times, or change order\nentirely, but with `log-surgeon`, you simply define the variable patterns, and `log-surgeon`\nJIT-compiles a tagged-DFA state machine to drive the full pipeline.\n\n### Key capabilities\n\n* **Extract variables** from log messages using regex patterns with named capture groups\n* **Generate log types** (templates) automatically for log analysis\n* **Parse streams** efficiently for large-scale log processing\n* **Export data** to pandas DataFrames and PyArrow Tables\n\n### Structured output and downstream capabilities\n\nUnstructured log data is automatically transformed into structured semantic representations.\n\n* **Log types (templates)**: Variables are replaced with placeholders to form reusable templates.\n  For example, roughly 200,000 Spark log messages can reduce to about 55 distinct templates, which\n  supports pattern analysis and anomaly detection.\n\n* **Semantic Variables**: Extracted key-value pairs with semantic context (e.g., `app_id`,\n  `app_name`, `worker_id`) can be used directly for analysis.\n\nThis structured output unlocks powerful downstream capabilities:\n\n* **Knowledge graph construction.** Build relationship graphs between entities extracted from logs\n  (e.g., linking `app_id` \u2192 `app_name` \u2192 `worker_id`). The structured output fits tools such as\n  [Stitch](https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhao), which\n  uses flow reconstruction from logs to perform non-intrusive performance profiling and debugging\n  across distributed systems.\n\n* **Template-based summarization.** Compress massive datasets into compact template sets for human\n  and agent consumption. Templates act as natural tokens for LLMs. Instead of millions of raw lines,\n  provide a small number of distinct templates with statistics.\n\n* **Hybrid search** Combine free-text search with structured queries. Log types enable\n  auto-completion and query suggestions on large datasets. Instead of searching through millions of\n  raw log lines, search across a compact set of templates first. Then project and filter on\n  structured variables (e.g., `status == \"ERROR\"`, `response_time > 1000`), and aggregate for\n  analysis.\n\n* **Agentic automation.** Agents can query by template, analyze variable distributions, identify\n  anomalies, and automate debugging tasks using structured signals rather than raw text.\n\n### When to use `log-surgeon`\n\n**Good fit**\n* Large-scale log processing (millions of lines)\n* Extracting structured data from semi-structured logs\n* Generating log templates for analytics\n* Multi-line log events (stack traces, JSON dumps)\n* Performance-critical parsing\n\n**Not ideal**\n* Simple one-off text extraction (use Python `re` module)\n* Highly irregular text without consistent delimiters\n* Patterns requiring full PCRE features (lookahead, backreferences)\n\n---\n\n## Getting started\n\nFollow the instructions below to get started with `log-surgeon-ffi`.\n\n### System requirements\n\n- Python >= 3.9\n- pandas\n- pyarrow\n\n#### Build requirements\n\n- C++20 compatible compiler\n- CMake >= 3.15\n\n### Installation\n\nTo install the library with pandas and PyArrow support for DataFrame/Arrow table exports, run the\nfollowing command:\n\n```bash\npip install log-surgeon-ffi\n```\n\nTo verify your installation, run the following command:\n\n```bash\npython -c \"from log_surgeon import Parser; print('Installation successful.')\"\n```\n\n**Note:** If you only need core parsing without DataFrame or Arrow exports, you can install a\nminimal environment, although pandas and PyArrow are included by default for convenience.\n\n### First steps\n\nAfter installation, follow these steps:\n\n1. **Read [Key Concepts](#key-concepts).** Token based parsing differs from traditional regex.\n2. **Run a [Quick start example](#quick-start-examples)** to see how it works.\n3. **Use `rf\"...\"` for patterns** to avoid escaping issues. See\n   [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns).\n4. **Check out [examples/](examples/)** to study some complete working examples.\n\n---\n\n> ### Important prerequisites\n> \n> `log-surgeon` uses token-based parsing, and its regex behavior differs from traditional engines.\n> Read the [Key Concepts](#key-concepts) section before writing patterns.\n> \n> Critical differences between token-based parsing and traditional regex behavior:\n> \n> * `.*` only matches within a single token (not across delimiters)\n> * `abc|def` requires grouping: use `(abc)|(def)` instead\n> * Use `{0,1}` for optional patterns, NOT `?`\n> \n> **Tip:** Use raw f-strings (`rf\"...\"`) for regex patterns. See \n> [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for more details.\n\n---\n\n### Quick start examples\n\nUse the following examples to get started.\n\n#### Basic parsing\n\nThe following code parses a simple log event with `log-surgeon`.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\n# Parse a sample log event\nlog_line = \"16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\\n\"\n\n# Create a parser and define extraction patterns\nparser = Parser()\nparser.add_var(\"resource\", rf\"(?<memory_gb>{PATTERN.FLOAT}) GiB ram\")\nparser.compile()\n\n# Parse a single event\nevent = parser.parse_event(log_line)\n\n# Access extracted data\nprint(f\"Message: {event.get_log_message().strip()}\")\nprint(f\"LogType: {event.get_log_type().strip()}\")\nprint(f\"Parsed Logs: {event}\")\n```\n\n**Output:**\n```\nMessage: 16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\nLogType: 16/05/04 04:24:58 INFO Registering worker with 1 core and <memory_gb> GiB ram\nParsed Logs: {\n  \"memory_gb\": \"4.0\"\n}\n```\n\nWe can see that the parser extracted structured data from the unstructured log line:\n* ***Message**: The original log line\n* **LogType**: Template with variable placeholder `<memory_gb>` showing the pattern structure\n* **Parsed variables**: Successfully extracted `memory_gb` value of \"4.0\" from the pattern match\n\n#### Try it yourself\n\nCopy this code and modify the pattern to extract both `memory_gb` AND `cores`:\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nlog_line = \"16/05/04 04:24:58 INFO Registering worker with 1 core and 4.0 GiB ram\\n\"\nparser = Parser()\n# TODO: Add pattern to capture both \"1\" (cores) and \"4.0\" (memory_gb)\nparser.add_var(\"resource\", rf\"...\")\nparser.compile()\n\nevent = parser.parse_event(log_line)\nprint(f\"Cores: {event['cores']}, Memory: {event['memory_gb']}\")\n```\n\n<details>\n<summary>Solution</summary>\n\n```python\nparser.add_var(\"resource\", rf\"(?<cores>\\d+) core and (?<memory_gb>{PATTERN.FLOAT}) GiB ram\")\n```\n</details>\n\n---\n\n#### Multiple capture groups\n\nThe following code parses a more-complex log event.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\n# Parse a sample log event\nlog_line = \"\"\"16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392\njava.io.IOException: Connection reset by peer\n        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)\n        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\n        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\n        at sun.nio.ch.IOUtil.read(IOUtil.java:192)\n        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\n        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\n        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\n        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\n        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\n        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\n        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\n        at java.lang.Thread.run(Thread.java:750)\n\"\"\"\n\n# Create a parser and define extraction patterns\nparser = Parser()\n\n# Add timestamp pattern\nparser.add_timestamp(\"TIMESTAMP_SPARK_1_6\", rf\"\\d{{2}}/\\d{{2}}/\\d{{2}} \\d{{2}}:\\d{{2}}:\\d{{2}}\")\n\n# Add variable patterns\nparser.add_var(\"SYSTEM_LEVEL\", rf\"(?<level>(INFO)|(WARN)|(ERROR))\")\nparser.add_var(\"SPARK_HOST_IP_PORT\", rf\"(?<spark_host>spark\\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})\")\nparser.add_var(\n  \"SYSTEM_EXCEPTION\",\n  rf\"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): \"\n  rf\"(?<system_exception_msg>{PATTERN.LOG_LINE})\"\n)\nparser.add_var(\n  rf\"SYSTEM_STACK_TRACE\",\n  rf\"(\\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})\"\n)\nparser.compile()\n\n# Parse a single event\nevent = parser.parse_event(log_line)\n\n# Access extracted data\nprint(f\"Message: {event.get_log_message().strip()}\")\nprint(f\"LogType: {event.get_log_type().strip()}\")\nprint(f\"Parsed Logs: {event}\")\n```\n\n**Output:**\n```\nMessage: 16/05/04 12:22:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392\njava.io.IOException: Connection reset by peer\n        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)\n        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\n        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\n        at sun.nio.ch.IOUtil.read(IOUtil.java:192)\n        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\n        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\n        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\n        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\n        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\n        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\n        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\n        at java.lang.Thread.run(Thread.java:750)\nLogType: <timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>\n<system_exception_type>: <system_exception_msg><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>\nParsed Logs: {\n  \"timestamp\": \"16/05/04 12:22:37\",\n  \"level\": \"WARN\",\n  \"spark_host\": \"spark-35\",\n  \"system_ip\": \"192.168.10.50\",\n  \"system_port\": \"55392\",\n  \"system_exception_type\": \"java.io.IOException\",\n  \"system_exception_msg\": \"Connection reset by peer\",\n  \"system_stack\": [\n    \"sun.nio.ch.FileDispatcherImpl.read0(Native Method)\",\n    \"sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\",\n    \"sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\",\n    \"sun.nio.ch.IOUtil.read(IOUtil.java:192)\",\n    \"sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\",\n    \"io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\",\n    \"io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\",\n    \"io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\",\n    \"io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\",\n    \"io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\",\n    \"io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\",\n    \"io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\",\n    \"io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\",\n    \"io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\",\n    \"java.lang.Thread.run(Thread.java:750)\"\n  ]\n}\n```\n\nThe parser extracted **multiple named capture groups** from a complex multi-line Java stack trace:\n* **Scalar fields**: `timestamp`, `level`, `spark_host`, `system_ip`, `system_port`,\n  `system_exception_type`, `system_exception_msg`\n* **Array field**: `system_stack` contains all 15 stack trace locations (demonstrates automatic\n  aggregation of repeated capture groups)\n* **LogType**: Template shows the structure with `<newLine>` markers indicating line boundaries in\n  the original log\n\n---\n\n#### Stream parsing\n\nWhen parsing log streams or files, timestamps are **required** to perform contextual anchoring.\nTimestamps act as delimiters that separate individual log events, enabling the parser to correctly\ngroup multi-line entries (like stack traces) into single events.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\n# Parse from string (automatically converted to io.StringIO)\nSAMPLE_LOGS = \"\"\"16/05/04 04:31:13 INFO master.Master: Registering app SparkSQL::192.168.10.76\n16/05/04 12:32:37 WARN server.TransportChannelHandler: Exception in connection from spark-35/192.168.10.50:55392\njava.io.IOException: Connection reset by peer\n        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)\n        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)\n        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)\n        at sun.nio.ch.IOUtil.read(IOUtil.java:192)\n        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)\n        at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)\n        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)\n        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)\n        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\n        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\n        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\n        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\n        at java.lang.Thread.run(Thread.java:750)\n16/05/04 04:37:53 INFO master.Master: 192.168.10.76:41747 got disassociated, removing it.\n\"\"\"\n\n# Define parser with patterns\nparser = Parser()\n# REQUIRED: Timestamp acts as contextual anchor to separate individual log events in the stream\nparser.add_timestamp(\"TIMESTAMP_SPARK_1_6\", rf\"\\d{{2}}/\\d{{2}}/\\d{{2}} \\d{{2}}:\\d{{2}}:\\d{{2}}\")\nparser.add_var(\"SYSTEM_LEVEL\", rf\"(?<level>(INFO)|(WARN)|(ERROR))\")\nparser.add_var(\"SPARK_APP_NAME\", rf\"(?<spark_app_name>SparkSQL::{PATTERN.IPV4})\")\nparser.add_var(\"SPARK_HOST_IP_PORT\", rf\"(?<spark_host>spark\\-{PATTERN.INT})/(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})\")\nparser.add_var(\n    \"SYSTEM_EXCEPTION\",\n    rf\"(?<system_exception_type>({PATTERN.JAVA_PACKAGE_SEGMENT})+[{PATTERN.JAVA_IDENTIFIER_CHARSET}]*Exception): \"\n    rf\"(?<system_exception_msg>{PATTERN.LOG_LINE})\"\n)\nparser.add_var(\n    rf\"SYSTEM_STACK_TRACE\", rf\"(\\s{{1,4}}at (?<system_stack>{PATTERN.JAVA_STACK_LOCATION})\"\n)\nparser.add_var(\"IP_PORT\", rf\"(?<system_ip>{PATTERN.IPV4}):(?<system_port>{PATTERN.PORT})\")\nparser.compile()\n\n# Stream parsing: iterate over multi-line log events\nfor idx, event in enumerate(parser.parse(SAMPLE_LOGS)):\n    print(f\"log-event-{idx} log template type:{event.get_log_type().strip()}\")\n```\n\n**Output:**\n```\nlog-event-0 log template type:<timestamp> <level> master.Master: Registering app <spark_app_name>\nlog-event-1 log template type:<timestamp> <level> server.TransportChannelHandler: Exception in connection from <spark_host>/<system_ip>:<system_port>\n<system_exception_type>: <system_exception_msg><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack><newLine>        at <system_stack>\nlog-event-2 log template type:<timestamp> <level> master.Master: <system_ip>:<system_port> got disassociated, removing it.<newLine>\n```\n\nThe parser successfully separated the log stream into **three distinct events** using timestamps as\ncontextual anchors:\n* **Event 0**: Single-line app registration log\n* **Event 1**: Multi-line exception with 15 stack trace lines (demonstrates how timestamps bind\n  multi-line events together)\n* **Event 2**: Single-line disassociation log\n\nEach log type shows the template structure with variable placeholders (`<level>`, `<system_ip>`,\netc.), enabling pattern-based log analysis and grouping.\n\n---\n\n#### Using `PATTERN` constants\n\nThe `PATTERN` class provides pre-built regex patterns for common log elements like IP addresses,\nUUIDs, numbers, and file paths. See the [PATTERN reference](#pattern) for the complete list of\navailable patterns.\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nparser = Parser()\nparser.add_var(\"network\", rf\"IP: (?<ip>{PATTERN.IPV4}) UUID: (?<id>{PATTERN.UUID})\")\nparser.add_var(\"metrics\", rf\"value=(?<value>{PATTERN.FLOAT})\")\nparser.compile()\n\nlog_line = \"IP: 192.168.1.1 UUID: 550e8400-e29b-41d4-a716-446655440000 value=42.5\"\nevent = parser.parse_event(log_line)\n\nprint(f\"IP: {event['ip']}\")\nprint(f\"UUID: {event['id']}\")\nprint(f\"Value: {event['value']}\")\n```\n\n**Output:**\n```\nIP: 192.168.1.1\nUUID: 550e8400-e29b-41d4-a716-446655440000\nValue: 42.5\n```\n\n---\n\n#### Export to DataFrame\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\n  \"metric\",\n  rf\"metric=(?<metric_name>\\w+) value=(?<value>\\d+)\"\n)\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: metric=cpu value=42\n2024-01-01 INFO: metric=memory value=100\n2024-01-01 INFO: metric=disk value=7\n\"\"\"\n\n# Create a query and export to DataFrame\nquery = (\n  Query(parser)\n  .select([\"metric_name\", \"value\"])\n  .from_(log_data)\n  .validate_query()\n)\n\ndf = query.to_dataframe()\nprint(df)\n```\n\n---\n\n#### Filtering events\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\"metric\", rf\"metric=(?<metric_name>\\w+) value=(?<value>\\d+)\")\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: metric=cpu value=42\n2024-01-01 INFO: metric=memory value=100\n2024-01-01 INFO: metric=disk value=7\n2024-01-01 INFO: metric=cpu value=85\n\"\"\"\n\n# Filter events where value > 50\nquery = (\n  Query(parser)\n  .select([\"metric_name\", \"value\"])\n  .from_(log_data)\n  .filter(lambda event: int(event['value']) > 50)\n  .validate_query()\n)\n\ndf = query.to_dataframe()\nprint(df)\n# Output:\n#   metric_name  value\n# 0      memory    100\n# 1         cpu     85\n```\n\n---\n\n#### Including log template type and log message\n\nUse special fields `@log_type` and `@log_message` to include alongside extracted variables:\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\"metric\", rf\"value=(?<value>\\d+)\")\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: Processing value=42\n2024-01-01 WARN: Processing value=100\n\"\"\"\n\n# Select log type, message, and all variables\nquery = (\n  Query(parser)\n  .select([\"@log_type\", \"@log_message\", \"*\"])\n  .from_(log_data)\n  .validate_query()\n)\n\ndf = query.to_dataframe()\nprint(df)\n# Output:\n#                          @log_type                         @log_message value\n# 0  <timestamp> INFO: Processing <metric>  2024-01-01 INFO: Processing value=42    42\n# 1  <timestamp> WARN: Processing <metric>  2024-01-01 WARN: Processing value=100  100\n```\n\nThe `\"*\"` wildcard expands to all variables defined in the schema and can be combined with other fields like `@log_type` and `@log_message`.\n\n---\n\n#### Analyzing Log Types\n\nDiscover and analyze log patterns in your data using log type analysis methods:\n\n```python\nfrom log_surgeon import Parser, Query\n\nparser = Parser()\nparser.add_var(\"metric\", rf\"value=(?<value>\\d+)\")\nparser.add_var(\"status\", rf\"status=(?<status>\\w+)\")\nparser.compile()\n\nlog_data = \"\"\"\n2024-01-01 INFO: Processing value=42\n2024-01-01 INFO: Processing value=100\n2024-01-01 WARN: System status=degraded\n2024-01-01 INFO: Processing value=7\n2024-01-01 ERROR: System status=failed\n\"\"\"\n\nquery = Query(parser).from_(log_data)\n\n# Get all unique log types\nprint(\"Unique log types:\")\nfor log_type in query.get_log_types():\n  print(f\"  {log_type}\")\n\n# Reset stream for next analysis\nquery.from_(log_data)\n\n# Get log type occurrence counts\nprint(\"\\nLog type counts:\")\ncounts = query.get_log_type_counts()\nfor log_type, count in sorted(counts.items(), key=lambda x: -x[1]):\n  print(f\"  {count:3d}  {log_type}\")\n\n# Reset stream for next analysis\nquery.from_(log_data)\n\n# Get sample messages for each log type\nprint(\"\\nLog type samples:\")\nsamples = query.get_log_type_with_sample(sample_size=2)\nfor log_type, messages in samples.items():\n  print(f\"  {log_type}\")\n  for msg in messages:\n    print(f\"    - {msg.strip()}\")\n```\n\n**Output:**\n```\nUnique log types:\n  <timestamp> INFO: Processing <metric>\n  <timestamp> WARN: System <status>\n  <timestamp> ERROR: System <status>\n\nLog type counts:\n    3  <timestamp> INFO: Processing <metric>\n    1  <timestamp> WARN: System <status>\n    1  <timestamp> ERROR: System <status>\n\nLog type samples:\n  <timestamp> INFO: Processing <metric>\n    - 2024-01-01 INFO: Processing value=42\n    - 2024-01-01 INFO: Processing value=100\n  <timestamp> WARN: System <status>\n    - 2024-01-01 WARN: System status=degraded\n  <timestamp> ERROR: System <status>\n    - 2024-01-01 ERROR: System status=failed\n```\n\n---\n\n## Key concepts\n\n> **CRITICAL: You must understand these concepts to use `log-surgeon` correctly.**\n>\n> `log-surgeon` works **fundamentally differently** from traditional regex engines like Python's\n> `re` module, PCRE, or JavaScript regex. Skipping this section may lead to patterns that don't\n> work as expected.\n\n### Token-based parsing and delimiters\n\n**CRITICAL:** `log-surgeon` uses **token-based** parsing, not character-based regex matching like\ntraditional regex engines. This is the most important difference that affects how patterns work.\n\n#### How tokenization works\n\nDelimiters are characters used to split log messages into tokens. The default delimiters include:\n- Whitespace: space, tab (`\\t`), newline (`\\n`), carriage return (`\\r`)\n- Punctuation: `:`, `,`, `!`, `;`, `%`, `@`, `/`, `(`, `)`, `[`, `]`\n\nFor example, with default delimiters, the log message:\n```\n\"abc def ghi\"\n```\nis tokenized into three tokens: `[\"abc\", \"def\", \"ghi\"]`\n\nYou can customize delimiters when creating a Parser:\n\n```python\nparser = Parser(delimiters=r\" \\t\\n,:\")  # Custom delimiters\n```\n\n#### Token-Based Pattern Matching\n\n**Critical:** Patterns like `.*` only match **within a single token**, not across multiple tokens or delimiters.\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser()  # Default delimiters include space\nparser.add_var(\"token\", rf\"(?<match>d.*)\")\nparser.compile()\n\n# With \"abc def ghi\" tokenized as [\"abc\", \"def\", \"ghi\"]\nevent = parser.parse_event(\"abc def ghi\")\n\n# Matches only \"def\" (single token starting with 'd')\n# Does NOT match \"def ghi\" (would cross token boundary)\nprint(event['match'])  # Output: \"def\"\n```\n\n**In a traditional regex engine**, `d.*` would match `\"def ghi\"` (everything from 'd' to end).\n**In log-surgeon**, `d.*` matches only `\"def\"` because patterns cannot cross delimiter boundaries.\n\n#### Why token-based?\n\nToken-based parsing enables:\n- **Faster parsing** by reducing search space\n- **Predictable behavior** aligned with log structure\n- **Efficient log type generation** for analytics\n\n#### Working with token boundaries\n\nTo match across multiple tokens, you must use **character classes** like `[a-zA-Z]*` instead of `.`:\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser()  # Default delimiters include space\n\n#  Using .* - only matches within a single token\nparser.add_var(\"wrong\", rf\"(?<match>d.*)\")  # Matches only \"def\"\n\n#  Using character classes - matches across tokens\nparser.add_var(\"correct\", rf\"(?<match>d[a-z ]*i)\")  # Matches \"def ghi\"\nparser.compile()\n\nevent = parser.parse_event(\"abc def ghi\")\nprint(event['match'])  # Output: \"def ghi\"\n```\n\n**Key Rule:** Character classes like `[a-zA-Z]*`, `[a-z ]*`, or `[\\w\\s]*` can match across token\nboundaries, but `.*` cannot.\n\n#### Alternation requires grouping\n\n**CRITICAL:** Alternation (`|`) works differently in log-surgeon compared to traditional regex\nengines. You **must** use parentheses to group alternatives.\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser()\n\n#  WRONG: Without grouping - matches \"ab\" AND (\"c\" OR \"d\") AND \"ef\"\nparser.add_var(\"wrong\", rf\"(?<word>abc|def)\")\n# In log-surgeon, this is interpreted as: \"ab\" + \"c|d\" + \"ef\"\n# Matches: \"abcef\" or \"abdef\" (NOT \"abc\" or \"def\")\n\n#  CORRECT: With grouping - matches \"abc\" OR \"def\"\nparser.add_var(\"correct\", rf\"(?<word>(abc)|(def))\")\n# Matches: \"abc\" or \"def\"\nparser.compile()\n```\n\n**In traditional regex engines**, `abc|def` means \"abc\" OR \"def\".\n**In log-surgeon**, `abc|def` means \"ab\" + (\"c\" OR \"d\") + \"ef\".\n\n**Key Rule:** Always use `(abc)|(def)` syntax for alternation to match complete alternatives.\n\n```python\n# More examples:\nparser.add_var(\"level\", rf\"(?<level>(ERROR)|(WARN)|(INFO))\")  #  Correct\nparser.add_var(\"status\", rf\"(?<status>(success)|(failure))\")  #  Correct\nparser.add_var(\"bad\", rf\"(?<status>success|failure)\")         #  Wrong - unexpected behavior\n```\n\n#### Optional patterns\n\nFor optional patterns, use `{0,1}` instead of `*`:\n\n```python\nfrom log_surgeon import Parser\n\nparser = Parser()\n\n#  Avoid using * for optional patterns (matches 0 or more)\nparser.add_var(\"avoid\", rf\"(?<level>(ERROR)|(WARN))*\")  # Can match empty string or multiple reps\n\n#  Do not use ? for optional patterns\nparser.add_var(\"avoid2\", rf\"(?<level>(ERROR)|(WARN))?\")  # May not work as expected\n\n#  Use {0,1} for optional patterns (matches 0 or 1)\nparser.add_var(\"optional\", rf\"(?<level>(ERROR)|(WARN)){0,1}\")  # Matches 0 or 1 occurrence\nparser.compile()\n```\n\n**Best practice:** Use `{0,1}` for optional elements. Avoid `*` (0 or more) and `?` for optional\nmatching.\n\nYou can also explicitly include delimiters in your pattern:\n\n```python\n# To match \"def ghi\", explicitly include the space delimiter\nparser.add_var(\"multi\", rf\"(?<match>d\\w+\\s+\\w+)\")\n# This matches \"def \" as one token segment, followed by \"ghi\"\n```\n\nOr adjust your delimiters to change tokenization behavior:\n\n```python\n# Use only newline as delimiter to treat entire lines as tokens\nparser = Parser(delimiters=r\"\\n\")\n```\n\n### Named capture groups\n\nUse named capture groups in regex patterns to extract specific fields:\n\n```python\nparser.add_var(\"metric\", rf\"metric=(?<metric_name>\\w+) value=(?<value>\\d+)\")\n```\n\nThe syntax `(?<name>pattern)` creates a capture group that can be accessed as `event['name']`.\n\n**Note:** See [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns) for best practices on\nwriting regex patterns.\n\n### Using raw f-strings for regex patterns\n\n> **\u26a0\ufe0f STRONGLY RECOMMENDED: Use raw f-strings (`rf\"...\"`) for all regex patterns.**\n>\n> While not absolutely required, using regular strings will likely cause escaping issues and pattern\nfailures. Raw f-strings prevent these problems.\n\nRaw f-strings combine the benefits of:\n- **Raw strings (`r\"...\"`)**: No need to double-escape regex special characters like `\\d`, `\\w`,\n  `\\n`\n- **f-strings (`f\"...\"`)**: Easy interpolation of variables and pattern constants\n\n#### Why use raw f-strings?\n\n```python\n#  Without raw strings - requires double-escaping\nparser.add_var(\"metric\", \"value=(\\\\d+)\")  # Hard to read, error-prone\n\n#  With raw f-strings - single escaping, clean and readable\nparser.add_var(\"metric\", rf\"value=(?<value>\\d+)\")\n```\n\n#### Watch out for braces in f-strings\n\nWhen using f-strings, literal `{` and `}` characters must be escaped by doubling them:\n\n```python\nfrom log_surgeon import Parser, Pattern\n\nparser = Parser()\n\n#  Correct: Escape literal braces in regex\nparser.add_var(\"json\", rf\"data={{(?<content>[^}}]+)}}\")  # Matches: data={...}\nparser.add_var(\"range\", rf\"range={{(?<min>\\d+),(?<max>\\d+)}}\")  # Matches: range={10,20}\n\n#  Using Pattern constants with interpolation\nparser.add_var(\"ip\", rf\"IP: (?<ip>{Pattern.IPV4})\")\nparser.add_var(\"float\", rf\"value=(?<val>{Pattern.FLOAT})\")\n\n#  Common regex patterns\nparser.add_var(\"digits\", rf\"\\d+ items\")  # No double-escaping needed\nparser.add_var(\"word\", rf\"name=(?<name>\\w+)\")\nparser.add_var(\"whitespace\", rf\"split\\s+by\\s+spaces\")\n\nparser.compile()\n```\n\n#### Examples: raw f-strings vs regular strings\n\n```python\n# Regular string - requires double-escaping\nparser.add_var(\"path\", \"path=(?<path>\\\\w+/\\\\w+)\")  # Hard to read\n\n# Raw f-string - natural regex syntax\nparser.add_var(\"path\", rf\"path=(?<path>\\w+/\\w+)\")  # Clean and readable\n\n# With interpolation\nlog_level = \"INFO|WARN|ERROR\"\nparser.add_var(\"level\", rf\"(?<level>{log_level})\")  # Easy to compose\n```\n\n**Recommendation:** Consistently use `rf\"...\"` for all regex patterns. This approach:\n- Avoids double-escaping mistakes that break patterns\n- Makes patterns more readable\n- Allows easy use of Pattern constants and variables\n- Only requires watching for literal braces `{` and `}` in f-strings (escape as `{{` and `}}`)\n\nUsing regular strings (`\"...\"`) will require double-escaping (e.g., `\"\\\\d+\"`) which is error-prone\nand can be hard to read.\n\n### Logical vs. physical names\n\nInternally, log-surgeon uses \"physical\" names (e.g., `CGPrefix0`, `CGPrefix1`) for capture groups,\nwhile you work with \"logical\" names (e.g., `user_id`, `thread`). The `GroupNameResolver` handles\nthis mapping automatically.\n\n### Schema Format\n\nThe schema defines delimiters, timestamps, and variables for parsing:\n\n```\n// schema delimiters\ndelimiters: \\t\\r\\n:,!;%@/\\(\\)\\[\\]\n\n// schema timestamps\ntimestamp:<timestamp_regex>\n\n// schema variables\nvariable_name:<variable_regex>\n```\n\nWhen using the fluent API (`Parser.add_var()` and `Parser.compile()`), the schema is built automatically.\n\n### Common Pitfalls\n\n **Pattern doesn't match anything**\n- Check: Are you using `.*` to match across tokens? Use `[a-zA-Z ]*` instead\n- Check: Did you forget to call `parser.compile()`?\n- Check: Are your delimiters splitting tokens unexpectedly?\n\n **Alternation not working (abc|def)**\n- Problem: `(?<name>abc|def)` doesn't match \"abc\" or \"def\" as expected\n- Solution: Use `(?<name>(abc)|(def))` with explicit grouping\n\n **Pattern works in regex tester but not here**\n- Remember: log-surgeon is token-based, not character-based\n- Traditional regex engines match across entire strings\n- log-surgeon matches within token boundaries (delimited by spaces, colons, etc.)\n- Read: [Token-Based Parsing](#token-based-parsing-and-delimiters)\n\n **Escape sequence errors in Python**\n- Problem: `parser.add_var(\"digits\", \"(?<num>\\d+)\")` raises SyntaxError\n- Solution: Use `rf\"...\"` (raw f-string) instead of `\"...\"` or `f\"...\"`\n- Example: `parser.add_var(\"digits\", rf\"(?<num>\\d+)\")`\n\n **Optional pattern matching incorrectly**\n- Problem: Using `?` or `*` for optional patterns\n- Solution: Use `{0,1}` for optional elements\n- Example: `(?<level>(ERROR)|(WARN)){0,1}` for optional log level\n\n---\n\n## Reference\n\n| Task | Syntax |\n|------|--------|\n| Named capture | `(?<name>pattern)` |\n| Alternation | `(?<name>(opt1)|(opt2))` (NOT `opt1|opt2`) |\n| Optional | `{0,1}` (NOT `?` or `*`) |\n| Match across tokens | Use `[a-z ]*` (NOT `.*`) |\n| Pattern string | `rf\"...\"` (raw f-string recommended) |\n| All variables | `.select([\"*\"])` |\n| Log type | `.select([\"@log_type\"])` |\n| Original message | `.select([\"@log_message\"])` |\n\n### Parser\n\nHigh-level parser for extracting structured data from unstructured log messages.\n\n#### Constructor\n\n- `Parser(delimiters: str = r\" \\t\\r\\n:,!;%@/\\(\\)\\[\\]\")`\n  - Initialize a parser with optional custom delimiters\n  - Default delimiters include space, tab, newline, and common punctuation\n\n#### Methods\n\n- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> Parser`\n  - Add a variable pattern to the parser's schema\n  - Supports named capture groups using `(?<name>)` syntax\n  - Use raw f-strings (`rf\"...\"`) for regex patterns (see [Using Raw f-strings](#using-raw-f-strings-for-regex-patterns))\n  - Returns self for method chaining\n\n- `add_timestamp(name: str, regex: str) -> Parser`\n  - Add a timestamp pattern to the parser's schema\n  - Returns self for method chaining\n\n- `compile(enable_debug_logs: bool = False) -> None`\n  - Build and initialize the parser with the configured schema\n  - Must be called after adding variables and before parsing\n  - Set `enable_debug_logs=True` to output debug information to stderr\n\n- `load_schema(schema: str, group_name_resolver: GroupNameResolver) -> None`\n  - Load a pre-built schema string to configure the parser\n\n- `parse(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Generator[LogEvent, None, None]`\n  - Parse all log events from a string, file object, or stream\n  - Accepts strings, text/binary file objects, StringIO, or BytesIO\n  - Yields LogEvent objects for each parsed event\n\n- `parse_event(payload: str) -> LogEvent | None`\n  - Parse a single log event from a string (convenience method)\n  - Wraps `parse()` and returns the first event\n  - Returns LogEvent or None if no event found\n\n### LogEvent\n\nRepresents a parsed log event with extracted variables.\n\n#### Methods\n\n- `get_log_message() -> str`\n  - Get the original log message\n\n- `get_log_type() -> str`\n  - Get the generated log type (template) with logical group names\n\n- `get_capture_group(logical_capture_group_name: str, raw_output: bool = False) -> str | list | None`\n  - Get the value of a capture group by its logical name\n  - If `raw_output=False` (default), single values are unwrapped from lists\n  - Returns None if capture group not found\n\n- `get_capture_group_str_representation(field: str, raw_output: bool = False) -> str`\n  - Get the string representation of a capture group value\n\n- `get_resolved_dict() -> dict[str, str | list]`\n  - Get a dictionary with all capture groups using logical (user-defined) names\n  - Physical names (CGPrefix*) are converted to logical names\n  - Timestamp fields are consolidated under \"timestamp\" key\n  - Single-value lists are unwrapped to scalar values\n  - \"@LogType\" is excluded from the output\n\n- `__getitem__(key: str) -> str | list`\n  - Access capture group values by name (e.g., `event['field_name']`)\n  - Shorthand for `get_capture_group(key, raw_output=False)`\n\n- `__str__() -> str`\n  - Get formatted JSON representation of the log event with logical group names\n  - Uses `get_resolved_dict()` internally\n\n### Query\n\nQuery builder for parsing log events into structured data formats.\n\n#### Constructor\n\n- `Query(parser: Parser)`\n  - Initialize a query with a configured parser\n\n#### Methods\n\n- `select(fields: list[str]) -> Query`\n  - Select fields to extract from log events\n  - Supports variable names, `\"*\"` for all variables, `\"@log_type\"` for log type, and `\"@log_message\"` for original message\n  - The `\"*\"` wildcard can be combined with other fields (e.g., `[\"@log_type\", \"*\"]`)\n  - Returns self for method chaining\n\n- `filter(predicate: Callable[[LogEvent], bool]) -> Query`\n  - Filter log events using a predicate function\n  - Predicate receives a LogEvent and returns True to include it, False to exclude\n  - Returns self for method chaining\n  - Example: `query.filter(lambda event: int(event['value']) > 50)`\n\n- `from_(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`\n  - Set the input source to parse\n  - Accepts strings, text/binary file objects, StringIO, or BytesIO\n  - Strings are automatically wrapped in StringIO\n  - Returns self for method chaining\n\n- `select_from(input: str | TextIO | BinaryIO | io.StringIO | io.BytesIO) -> Query`\n  - Alias for `from_()`\n  - Returns self for method chaining\n\n- `validate_query() -> Query`\n  - Validate that the query is properly configured\n  - Returns self for method chaining\n\n- `to_dataframe() -> pd.DataFrame`\n  - Convert parsed events to a pandas DataFrame\n\n- `to_df() -> pd.DataFrame`\n  - Alias for `to_dataframe()`\n\n- `to_arrow() -> pa.Table`\n  - Convert parsed events to a PyArrow Table\n\n- `to_pa() -> pa.Table`\n  - Alias for `to_arrow()`\n\n- `get_rows() -> list[list]`\n  - Extract rows of field values from parsed events\n\n- `get_vars() -> KeysView[str]`\n  - Get all variable names (logical capture group names) defined in the schema\n\n- `get_log_types() -> Generator[str, None, None]`\n  - Get all unique log types from parsed events\n  - Yields log types in the order they are first encountered\n  - Useful for discovering log patterns in your data\n\n- `get_log_type_counts() -> dict[str, int]`\n  - Get count of occurrences for each unique log type\n  - Returns dictionary mapping log types to their counts\n  - Useful for analyzing log type distribution\n\n- `get_log_type_with_sample(sample_size: int = 3) -> dict[str, list[str]]`\n  - Get sample log messages for each unique log type\n  - Returns dictionary mapping log types to lists of sample messages\n  - Useful for understanding what actual messages match each template\n\n### SchemaCompiler\n\nCompiler for constructing log-surgeon schema definitions.\n\n#### Constructor\n\n- `SchemaCompiler(delimiters: str = DEFAULT_DELIMITERS)`\n  - Initialize a schema compiler with optional custom delimiters\n\n#### Methods\n\n- `add_var(name: str, regex: str, hide_var_name_if_named_group_present: bool = True) -> SchemaCompiler`\n  - Add a variable pattern to the schema\n  - Returns self for method chaining\n\n- `add_timestamp(name: str, regex: str) -> SchemaCompiler`\n  - Add a timestamp pattern to the schema\n  - Returns self for method chaining\n\n- `remove_var(var_name: str) -> SchemaCompiler`\n  - Remove a variable from the schema\n  - Returns self for method chaining\n\n- `get_var(var_name: str) -> Variable`\n  - Get a variable by name\n\n- `compile() -> str`\n  - Compile the final schema string\n\n- `get_capture_group_name_resolver() -> GroupNameResolver`\n  - Get the resolver for mapping logical to physical capture group names\n\n### GroupNameResolver\n\nBidirectional mapping between logical (user-defined) and physical (auto-generated) group names.\n\n#### Constructor\n\n- `GroupNameResolver(physical_name_prefix: str)`\n  - Initialize with a prefix for auto-generated physical names\n\n#### Methods\n\n- `create_new_physical_name(logical_name: str) -> str`\n  - Create a new unique physical name for a logical name\n  - Each call generates a new physical name\n\n- `get_physical_names(logical_name: str) -> set[str]`\n  - Get all physical names associated with a logical name\n\n- `get_logical_name(physical_name: str) -> str`\n  - Get the logical name for a physical name\n\n- `get_all_logical_names() -> KeysView[str]`\n  - Get all logical names that have been registered\n\n### PATTERN\n\nCollection of pre-built regex patterns optimized for log parsing. These patterns follow log-surgeon's syntax requirements and are ready to use with named capture groups.\n\n#### Available Patterns\n\n**Network Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.UUID` | UUID (Universally Unique Identifier) | `550e8400-e29b-41d4-a716-446655440000` |\n| `PATTERN.IP_OCTET` | Single IPv4 octet (0-255) | `192`, `10`, `255` |\n| `PATTERN.IPV4` | IPv4 address | `192.168.1.1`, `10.0.0.1` |\n| `PATTERN.PORT` | Network port number (1-5 digits) | `80`, `8080`, `65535` |\n\n**Numeric Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.INT` | Integer with optional negative sign | `42`, `-123`, `0` |\n| `PATTERN.FLOAT` | Float with optional negative sign | `3.14`, `-123.456`, `0.5` |\n\n**File System Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.LINUX_FILE_NAME_CHARSET` | Character set for Linux file names | `a-zA-Z0-9 ._-` |\n| `PATTERN.LINUX_FILE_NAME` | Linux file name | `app.log`, `config-2024.yaml` |\n| `PATTERN.LINUX_FILE_PATH` | Linux file path (relative) | `logs/app.log`, `var/log/system.log` |\n\n**Character Sets and Word Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.JAVA_IDENTIFIER_CHARSET` | Java identifier character set | `a-zA-Z0-9_` |\n| `PATTERN.JAVA_IDENTIFIER` | Java identifier | `myVariable`, `$value`, `Test123` |\n| `PATTERN.LOG_LINE_CHARSET` | Common log line characters | Alphanumeric + symbols + whitespace |\n| `PATTERN.LOG_LINE` | General log line content | `Error: connection timeout` |\n| `PATTERN.LOG_LINE_NO_WHITE_SPACE_CHARSET` | Log line chars without whitespace | Alphanumeric + symbols only |\n| `PATTERN.LOG_LINE_NO_WHITE_SPACE` | Log content without spaces | `ERROR`, `/var/log/app.log` |\n\n**Java-Specific Patterns**\n\n| Pattern | Description | Example Match |\n|---------|-------------|---------------|\n| `PATTERN.JAVA_LITERAL_CHARSET` | Java literal character set | `a-zA-Z0-9_$` |\n| `PATTERN.JAVA_PACKAGE_SEGMENT` | Single Java package segment | `com.`, `example.` |\n| `PATTERN.JAVA_CLASS_NAME` | Java class name | `MyClass`, `ArrayList` |\n| `PATTERN.JAVA_FULLY_QUALIFIED_CLASS_NAME` | Fully qualified class name | `java.util.ArrayList` |\n| `PATTERN.JAVA_LOGGING_CODE_LOCATION_HINT` | Java logging location hint | `~[MyClass.java:42?]` |\n| `PATTERN.JAVA_STACK_LOCATION` | Java stack trace location | `java.util.ArrayList.add(ArrayList.java:123)` |\n\n#### Example usage\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nparser = Parser()\n\n# Network patterns\nparser.add_var(\"network\", rf\"IP: (?<ip>{PATTERN.IPV4}) Port: (?<port>{PATTERN.PORT})\")\n\n# Numeric patterns\nparser.add_var(\"metrics\", rf\"value=(?<value>{PATTERN.FLOAT}) count=(?<count>{PATTERN.INT})\")\n\n# File system patterns\nparser.add_var(\"file\", rf\"Opening (?<filepath>{PATTERN.LINUX_FILE_PATH})\")\n\n# Java patterns\nparser.add_var(\"exception\", rf\"at (?<stack>{PATTERN.JAVA_STACK_LOCATION})\")\n\nparser.compile()\n```\n\n#### Composing Patterns\n\nPATTERN constants can be composed to build more complex patterns:\n\n```python\nfrom log_surgeon import Parser, PATTERN\n\nparser = Parser()\n\n# Combine multiple patterns\nparser.add_var(\n    \"server_info\",\n    rf\"Server (?<name>{PATTERN.JAVA_IDENTIFIER}) at (?<ip>{PATTERN.IPV4}):(?<port>{PATTERN.PORT})\"\n)\n\n# Use character sets to build custom patterns\nparser.add_var(\n    \"custom_id\",\n    rf\"ID-(?<id>[{PATTERN.JAVA_IDENTIFIER_CHARSET}]+)\"\n)\n\nparser.compile()\n```\n\n---\n\n## Development\n\n### Building from source\n\n```bash\n# Clone the repository\ngit clone https://github.com/y-scope/log-surgeon-ffi-py.git\ncd log-surgeon-ffi-py\n\n# Install the project in editable mode\npip install -e .\n\n# Build the extension\ncmake -S . -B build\ncmake --build build\n```\n\n### Running tests\n\n```bash\n# Install test dependencies\npip install pytest\n\n# Run tests\npython -m pytest tests/\n```\n\n---\n\n## License\n\nApache License 2.0 - See [LICENSE](LICENSE) for details.\n\n---\n\n## Links\n\n- [Homepage](https://github.com/y-scope/log-surgeon-ffi-py)\n- [Bug Tracker](https://github.com/y-scope/log-surgeon-ffi-py/issues)\n- [log-surgeon C++ library](https://github.com/y-scope/log-surgeon)\n\n---\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n",
    "bugtrack_url": null,
    "license": "Apache License 2.0",
    "summary": "Python FFI bindings for log-surgeon: high-performance parsing of unstructured logs into structured data",
    "version": "0.1.0b4",
    "project_urls": {
        "Bug Tracker": "https://github.com/y-scope/log-surgeon-ffi-py/issues",
        "Homepage": "https://github.com/y-scope/log-surgeon-ffi-py"
    },
    "split_keywords": [
        "logging",
        " log-parsing",
        " log-analysis",
        " structured-data",
        " performance",
        " observability"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5a9793c6d62614a9984080cff66f5664f08d5b963c98c6bbd108e26ddb307085",
                "md5": "d41d2bd9b9093d8ea5ed9cbc572e2d40",
                "sha256": "49e1f0712140e8b53d39b0970d76f0291e7ba470ea4df8c7ed41190de9c43114"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "d41d2bd9b9093d8ea5ed9cbc572e2d40",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 339352,
            "upload_time": "2025-10-27T16:38:10",
            "upload_time_iso_8601": "2025-10-27T16:38:10.766794Z",
            "url": "https://files.pythonhosted.org/packages/5a/97/93c6d62614a9984080cff66f5664f08d5b963c98c6bbd108e26ddb307085/log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "3fdff3da814b7f0f078f3e6d81c86c11c90f6446f62ecd340e64c6e5a448638e",
                "md5": "18611017e3299da3169c5c808d8ccccb",
                "sha256": "46151940c76d82b6bc8567d84b491c3d6dafa1b3577d554dd229bc9c28c2c2e6"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "18611017e3299da3169c5c808d8ccccb",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 367500,
            "upload_time": "2025-10-27T16:38:13",
            "upload_time_iso_8601": "2025-10-27T16:38:13.131031Z",
            "url": "https://files.pythonhosted.org/packages/3f/df/f3da814b7f0f078f3e6d81c86c11c90f6446f62ecd340e64c6e5a448638e/log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4e440479773731d183da1c3acfcde4777c53ee55bc9d99b4309d1c2a5f0ee509",
                "md5": "4ffe485d25b62841135e100833a2dd69",
                "sha256": "d69b612e89e06b565ee5c1c51e51c760eb6304b32eba9c3507cee8f785e01a48"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4ffe485d25b62841135e100833a2dd69",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 351110,
            "upload_time": "2025-10-27T16:38:16",
            "upload_time_iso_8601": "2025-10-27T16:38:16.712690Z",
            "url": "https://files.pythonhosted.org/packages/4e/44/0479773731d183da1c3acfcde4777c53ee55bc9d99b4309d1c2a5f0ee509/log_surgeon_ffi-0.1.0b4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "74b97a10e1e130ff31b24edf37f5912eda06d881a2e8593d0add6f07a8e100d9",
                "md5": "c42ef923f61f8c73d5b9ad065b30a3e0",
                "sha256": "7d00e36a937d1667d7ec38d388fc5daf64866815f1bb7d747c7e82b4b41d6326"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "c42ef923f61f8c73d5b9ad065b30a3e0",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1265909,
            "upload_time": "2025-10-27T16:38:18",
            "upload_time_iso_8601": "2025-10-27T16:38:18.253028Z",
            "url": "https://files.pythonhosted.org/packages/74/b9/7a10e1e130ff31b24edf37f5912eda06d881a2e8593d0add6f07a8e100d9/log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "034899af9b94d3a88f58120225e261868ea5ccaafdb329b371daa9187f53502d",
                "md5": "5f9ffb0dab694ab0d9e434e946b8c004",
                "sha256": "924ce7da1aa58e7e7409284b76eb7fef0294682f4468b08aff218fb2db158f28"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "5f9ffb0dab694ab0d9e434e946b8c004",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1431122,
            "upload_time": "2025-10-27T16:38:19",
            "upload_time_iso_8601": "2025-10-27T16:38:19.877086Z",
            "url": "https://files.pythonhosted.org/packages/03/48/99af9b94d3a88f58120225e261868ea5ccaafdb329b371daa9187f53502d/log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ec1e7bb7032a170ca87c403af733d886a999b9018634e552d929af3751c8979e",
                "md5": "8a319b701eed48c1b33c8069d5d60186",
                "sha256": "3b0a36c1da5745e94dcc1ff2d0ecaebf9182043f8d367634f49b9c5361f3d952"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "8a319b701eed48c1b33c8069d5d60186",
            "packagetype": "bdist_wheel",
            "python_version": "cp310",
            "requires_python": ">=3.9",
            "size": 1326058,
            "upload_time": "2025-10-27T16:38:21",
            "upload_time_iso_8601": "2025-10-27T16:38:21.139370Z",
            "url": "https://files.pythonhosted.org/packages/ec/1e/7bb7032a170ca87c403af733d886a999b9018634e552d929af3751c8979e/log_surgeon_ffi-0.1.0b4-cp310-cp310-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6efba8fbf4c654bcd437aad5f7bed67da682af9bbfaf5b87ba49ff65a8bb3057",
                "md5": "a1f92c4bf528f05c314b2b6106e889e7",
                "sha256": "51e1f35e23996f057cdb6757cf58287f41846e5cce4aa49f7b62504626770ce2"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "a1f92c4bf528f05c314b2b6106e889e7",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 339350,
            "upload_time": "2025-10-27T16:38:22",
            "upload_time_iso_8601": "2025-10-27T16:38:22.591542Z",
            "url": "https://files.pythonhosted.org/packages/6e/fb/a8fbf4c654bcd437aad5f7bed67da682af9bbfaf5b87ba49ff65a8bb3057/log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4f2ad5172f1820c4b5701907924403de1d7b4301a7b89a8d2fc135e5646621a3",
                "md5": "6536785b7fcab4265466819a9e580eef",
                "sha256": "c60fa11d004817bedff0bf2802946876b6d5fe668b42c835137dcc2e2646f910"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "6536785b7fcab4265466819a9e580eef",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 367498,
            "upload_time": "2025-10-27T16:38:23",
            "upload_time_iso_8601": "2025-10-27T16:38:23.938708Z",
            "url": "https://files.pythonhosted.org/packages/4f/2a/d5172f1820c4b5701907924403de1d7b4301a7b89a8d2fc135e5646621a3/log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "b7d591f1fa624e35700d075b42c388c3daf61d8feb0a1f623941b9cc40e4ad64",
                "md5": "76edd55d4262fd92671d94d372504c00",
                "sha256": "ee0cb940c3ca50ed68ec44fbb46640ef80704a02a086c36d8c6fb5165a4fa1c0"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "76edd55d4262fd92671d94d372504c00",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 351110,
            "upload_time": "2025-10-27T16:38:25",
            "upload_time_iso_8601": "2025-10-27T16:38:25.117741Z",
            "url": "https://files.pythonhosted.org/packages/b7/d5/91f1fa624e35700d075b42c388c3daf61d8feb0a1f623941b9cc40e4ad64/log_surgeon_ffi-0.1.0b4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c0af151b7559edfca2b8ab9a6a7c4f4bd68f69699a3494fedc00f065dc56aa9b",
                "md5": "eeb5eef6abd94f43bd40709b990eac4a",
                "sha256": "acda63fc5be39f4fa70500b6685382485897f0528410ac4f2ceb6d8f5c86e17b"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "eeb5eef6abd94f43bd40709b990eac4a",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1265911,
            "upload_time": "2025-10-27T16:38:26",
            "upload_time_iso_8601": "2025-10-27T16:38:26.576461Z",
            "url": "https://files.pythonhosted.org/packages/c0/af/151b7559edfca2b8ab9a6a7c4f4bd68f69699a3494fedc00f065dc56aa9b/log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "ce07ae7c8d14f1c90bbff7ad2e49e8f5c8ff4c5710bbc2015122d3d67e63d753",
                "md5": "09add42c54812eed8eaf11bf49046310",
                "sha256": "651f9e565f1c41464bc8928bf83b81be9ed6b75356bc5f2a6ae6dd74c32b16a3"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "09add42c54812eed8eaf11bf49046310",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1431125,
            "upload_time": "2025-10-27T16:38:28",
            "upload_time_iso_8601": "2025-10-27T16:38:28.200600Z",
            "url": "https://files.pythonhosted.org/packages/ce/07/ae7c8d14f1c90bbff7ad2e49e8f5c8ff4c5710bbc2015122d3d67e63d753/log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4f33f269167f909cc2c45e4e557e5731a92f2027338887d5961c4c025f94ad94",
                "md5": "6da0ec55ebdf7a0a6be81395b6e6be58",
                "sha256": "023fa13694855b71b92cfafb0bc7b08d54f3f9c7bb4f1b2cf18713413f4aebc5"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "6da0ec55ebdf7a0a6be81395b6e6be58",
            "packagetype": "bdist_wheel",
            "python_version": "cp311",
            "requires_python": ">=3.9",
            "size": 1326062,
            "upload_time": "2025-10-27T16:38:29",
            "upload_time_iso_8601": "2025-10-27T16:38:29.376900Z",
            "url": "https://files.pythonhosted.org/packages/4f/33/f269167f909cc2c45e4e557e5731a92f2027338887d5961c4c025f94ad94/log_surgeon_ffi-0.1.0b4-cp311-cp311-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "dffa1339883bba67efce44689851a6a79c8281e580ae0dff67763c60d5605d2d",
                "md5": "1a8e8696d9f64575ba3497b431789ae6",
                "sha256": "43f3012e955e38b1395b3691e5952a9baee0347fb0600c433d8fbf8e205e488e"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "1a8e8696d9f64575ba3497b431789ae6",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 339403,
            "upload_time": "2025-10-27T16:38:30",
            "upload_time_iso_8601": "2025-10-27T16:38:30.728274Z",
            "url": "https://files.pythonhosted.org/packages/df/fa/1339883bba67efce44689851a6a79c8281e580ae0dff67763c60d5605d2d/log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7103deb181fb7d448cccb2e4706299714f3e1b9aa55527c2d255255048a697cf",
                "md5": "8d4fe6d762f5915d255d313c666512a4",
                "sha256": "91237e2db3610016566fa6d281ed1229b58c6ccb7e6c3551bd0f27801b72ae09"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "8d4fe6d762f5915d255d313c666512a4",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 367637,
            "upload_time": "2025-10-27T16:38:32",
            "upload_time_iso_8601": "2025-10-27T16:38:32.310179Z",
            "url": "https://files.pythonhosted.org/packages/71/03/deb181fb7d448cccb2e4706299714f3e1b9aa55527c2d255255048a697cf/log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "478bfcfbfa67ab585ca28461f16e94e24e7624d0507b4bbf564d13bd231b4bae",
                "md5": "36f0f7c431343f7fa9d3d3e56c59d2f5",
                "sha256": "979ecc9610850359d501c88a0c4a8f3e3e466d58cd53348877466c596971e0df"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "36f0f7c431343f7fa9d3d3e56c59d2f5",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 351213,
            "upload_time": "2025-10-27T16:38:33",
            "upload_time_iso_8601": "2025-10-27T16:38:33.337177Z",
            "url": "https://files.pythonhosted.org/packages/47/8b/fcfbfa67ab585ca28461f16e94e24e7624d0507b4bbf564d13bd231b4bae/log_surgeon_ffi-0.1.0b4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "a6b9a10e1fa6ea57d6e3396393f832030d80293825dfcc0e15b302d199f8b90a",
                "md5": "a870e22ccc7a6f0a3639537b01f01b2f",
                "sha256": "d9093a7d5cc212125ca4ed142f0b780f12e4904be133deacb51cdbc331d45c1b"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "a870e22ccc7a6f0a3639537b01f01b2f",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 1265961,
            "upload_time": "2025-10-27T16:38:34",
            "upload_time_iso_8601": "2025-10-27T16:38:34.423812Z",
            "url": "https://files.pythonhosted.org/packages/a6/b9/a10e1fa6ea57d6e3396393f832030d80293825dfcc0e15b302d199f8b90a/log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "57e9c58eecb156b62229ce532ce8ab0abaefbaef87f8bc5bdb3e39b2fc685211",
                "md5": "c2d828c9f6fcc58949a69265d34453f0",
                "sha256": "3084e098a1a5d599b0a2d6203d43cffc2a52f903e3cb398ac409141a7ee0ca33"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "c2d828c9f6fcc58949a69265d34453f0",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 1431166,
            "upload_time": "2025-10-27T16:38:35",
            "upload_time_iso_8601": "2025-10-27T16:38:35.749438Z",
            "url": "https://files.pythonhosted.org/packages/57/e9/c58eecb156b62229ce532ce8ab0abaefbaef87f8bc5bdb3e39b2fc685211/log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "06deefe535ee1021cb9ddb8fa8fd4b1e1f63b13dd81f3bb838ae3f7dc17d8ed2",
                "md5": "6e855fa38dee3f740ad4823ed2ce7e87",
                "sha256": "f4512155ea6d885b6eeb8c8ba30f5e58cb2b27565c782fd8ef295d856f151ace"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "6e855fa38dee3f740ad4823ed2ce7e87",
            "packagetype": "bdist_wheel",
            "python_version": "cp312",
            "requires_python": ">=3.9",
            "size": 1326131,
            "upload_time": "2025-10-27T16:38:36",
            "upload_time_iso_8601": "2025-10-27T16:38:36.941029Z",
            "url": "https://files.pythonhosted.org/packages/06/de/efe535ee1021cb9ddb8fa8fd4b1e1f63b13dd81f3bb838ae3f7dc17d8ed2/log_surgeon_ffi-0.1.0b4-cp312-cp312-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "9b0fc8593373ab2a5d825b354ef424a85aac2c13bf2e8df462b894bae15a5673",
                "md5": "e662ec48f708bb62dcee6efd2cecd9bb",
                "sha256": "d25162f8bbfc50bac08c7333efb9a28af8b55dd4682446a946c7852b3ebfb2d8"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "e662ec48f708bb62dcee6efd2cecd9bb",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.9",
            "size": 339361,
            "upload_time": "2025-10-27T16:38:38",
            "upload_time_iso_8601": "2025-10-27T16:38:38.101138Z",
            "url": "https://files.pythonhosted.org/packages/9b/0f/c8593373ab2a5d825b354ef424a85aac2c13bf2e8df462b894bae15a5673/log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "92820e7e0b07ad358808415594a5cdfe25741a2e3da43f81be61c4d95fe5d771",
                "md5": "ded8a788c63ee164134e682cd66d6610",
                "sha256": "031c233bfa9c74cc1706150c1e41e5e37c654eded3c6897a9a0f05abf26ece5b"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "ded8a788c63ee164134e682cd66d6610",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.9",
            "size": 367582,
            "upload_time": "2025-10-27T16:38:39",
            "upload_time_iso_8601": "2025-10-27T16:38:39.536518Z",
            "url": "https://files.pythonhosted.org/packages/92/82/0e7e0b07ad358808415594a5cdfe25741a2e3da43f81be61c4d95fe5d771/log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "adcb9f9094ade52939b454cf84023a5f5716a29d5020b8919244b2518f244dd8",
                "md5": "998ab75424bb6732c60a56398d07490d",
                "sha256": "bcf3b5d92f939c3ea71ceea7c3bdee00fab7ea5da86708a4b2fac16c996696a3"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "998ab75424bb6732c60a56398d07490d",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.9",
            "size": 351212,
            "upload_time": "2025-10-27T16:38:42",
            "upload_time_iso_8601": "2025-10-27T16:38:42.011616Z",
            "url": "https://files.pythonhosted.org/packages/ad/cb/9f9094ade52939b454cf84023a5f5716a29d5020b8919244b2518f244dd8/log_surgeon_ffi-0.1.0b4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "89197a0dcfea834093112589a3776c848e3ecdfb060ab84a6b1bf1a406a195c3",
                "md5": "07c3dc8ae0765e774a7f81af1e59959f",
                "sha256": "3ab6cf3ba2ff2c1717a569f5055aa217299418aea7d8ad48112a4db039f1cae3"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "07c3dc8ae0765e774a7f81af1e59959f",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.9",
            "size": 1265955,
            "upload_time": "2025-10-27T16:38:43",
            "upload_time_iso_8601": "2025-10-27T16:38:43.219212Z",
            "url": "https://files.pythonhosted.org/packages/89/19/7a0dcfea834093112589a3776c848e3ecdfb060ab84a6b1bf1a406a195c3/log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7fd8f3d028b1925dd93eae879987f2a737aa0e764f25cdac2b88e1f6877a9788",
                "md5": "7b6cfdd93f6a98f01518b275b7ce0f5b",
                "sha256": "61d542ce394318af73197f8809e738721d259386a9be41d8ecc28fb83d4cf2cc"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "7b6cfdd93f6a98f01518b275b7ce0f5b",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.9",
            "size": 1431163,
            "upload_time": "2025-10-27T16:38:44",
            "upload_time_iso_8601": "2025-10-27T16:38:44.463094Z",
            "url": "https://files.pythonhosted.org/packages/7f/d8/f3d028b1925dd93eae879987f2a737aa0e764f25cdac2b88e1f6877a9788/log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "669069c33f2e9c2577f5f46f3684de017abb2439b89ee47d0bc2e2b70466ca86",
                "md5": "4bed7d66b47bf496a6056617d3537581",
                "sha256": "42da0434c6e145bfa6b755fa81276160e113018b826b043314463fb9e9d9feaa"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "4bed7d66b47bf496a6056617d3537581",
            "packagetype": "bdist_wheel",
            "python_version": "cp313",
            "requires_python": ">=3.9",
            "size": 1326171,
            "upload_time": "2025-10-27T16:38:45",
            "upload_time_iso_8601": "2025-10-27T16:38:45.747890Z",
            "url": "https://files.pythonhosted.org/packages/66/90/69c33f2e9c2577f5f46f3684de017abb2439b89ee47d0bc2e2b70466ca86/log_surgeon_ffi-0.1.0b4-cp313-cp313-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "1e0b6dce3ec6af13b85904bce664e934a5f8d68567f1711cf1a25a20fe444abb",
                "md5": "2e4b37c1e8b78075d30523977aa9a218",
                "sha256": "316d8b74384761b65d5325c7abe79609f25fc78624f52824185f543bc7175367"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "has_sig": false,
            "md5_digest": "2e4b37c1e8b78075d30523977aa9a218",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 339348,
            "upload_time": "2025-10-27T16:38:46",
            "upload_time_iso_8601": "2025-10-27T16:38:46.957821Z",
            "url": "https://files.pythonhosted.org/packages/1e/0b/6dce3ec6af13b85904bce664e934a5f8d68567f1711cf1a25a20fe444abb/log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "4d3dff25e2768a894103983450b00e9674b78359d71e8f44671e35da1c84277b",
                "md5": "5da813ff7e7c12dccaf6089c839a09b6",
                "sha256": "5ae6c133fe4e2e7ece46431c360eaaaa14b5a8c093b978fb6a0c14df253f4b8b"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl",
            "has_sig": false,
            "md5_digest": "5da813ff7e7c12dccaf6089c839a09b6",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 367503,
            "upload_time": "2025-10-27T16:38:47",
            "upload_time_iso_8601": "2025-10-27T16:38:47.969148Z",
            "url": "https://files.pythonhosted.org/packages/4d/3d/ff25e2768a894103983450b00e9674b78359d71e8f44671e35da1c84277b/log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "d9a7bad78b0e1356d3219e7b0a1dd401fe08ad6dc92644550ffc13c9bb428d1b",
                "md5": "ce494ff4781f1187838ede47a128e5fb",
                "sha256": "7474c0c4e65f7b277cebb9dafd82f82d3bc1bb9aff46f57e0fb86204e2d58e33"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "ce494ff4781f1187838ede47a128e5fb",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 351108,
            "upload_time": "2025-10-27T16:38:49",
            "upload_time_iso_8601": "2025-10-27T16:38:49.017187Z",
            "url": "https://files.pythonhosted.org/packages/d9/a7/bad78b0e1356d3219e7b0a1dd401fe08ad6dc92644550ffc13c9bb428d1b/log_surgeon_ffi-0.1.0b4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "20d7f19e12bf8ba83274171dc3d785a50c3461287180d87ccdaca01952f5cd0f",
                "md5": "375ef90bf5279a0abbd8433d129a8de5",
                "sha256": "e0e13f635da79b02ecdc298840cb7a5b0e825854f4c03c3a33e8d2fa29985078"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_aarch64.whl",
            "has_sig": false,
            "md5_digest": "375ef90bf5279a0abbd8433d129a8de5",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1265907,
            "upload_time": "2025-10-27T16:38:50",
            "upload_time_iso_8601": "2025-10-27T16:38:50.136066Z",
            "url": "https://files.pythonhosted.org/packages/20/d7/f19e12bf8ba83274171dc3d785a50c3461287180d87ccdaca01952f5cd0f/log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_aarch64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "c03fd81ea6fed39c2beae484a3a63bb27e1bacd2854d6d78623e856033306a16",
                "md5": "bb69446610c7603153a64c1fa3e10785",
                "sha256": "0164e56dbaccad540f1cd3dd035d5853f8fbd76e6b186c3b501a85f1cfda6c3b"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_i686.whl",
            "has_sig": false,
            "md5_digest": "bb69446610c7603153a64c1fa3e10785",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1431117,
            "upload_time": "2025-10-27T16:38:51",
            "upload_time_iso_8601": "2025-10-27T16:38:51.426058Z",
            "url": "https://files.pythonhosted.org/packages/c0/3f/d81ea6fed39c2beae484a3a63bb27e1bacd2854d6d78623e856033306a16/log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_i686.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "11ecfc8446edcff6399b8e099db9ea78ff80d27127dae952bf7d13b5e4a20c35",
                "md5": "5f797b1cdd12bcbbc946ed41b6782074",
                "sha256": "d0956dfa579acbd51810bcd6aadbe7c5eb6548a70645cdcdd97951670d667a27"
            },
            "downloads": -1,
            "filename": "log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_x86_64.whl",
            "has_sig": false,
            "md5_digest": "5f797b1cdd12bcbbc946ed41b6782074",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1326057,
            "upload_time": "2025-10-27T16:38:53",
            "upload_time_iso_8601": "2025-10-27T16:38:53.007390Z",
            "url": "https://files.pythonhosted.org/packages/11/ec/fc8446edcff6399b8e099db9ea78ff80d27127dae952bf7d13b5e4a20c35/log_surgeon_ffi-0.1.0b4-cp39-cp39-musllinux_1_2_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-27 16:38:10",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "y-scope",
    "github_project": "log-surgeon-ffi-py",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "log-surgeon-ffi"
}

None