sakurs


Namesakurs JSON
Version 0.1.1 PyPI version JSON
download
home_pagehttps://github.com/sog4be/sakurs
SummaryFast, parallel sentence boundary detection using Delta-Stack Monoid algorithm
upload_time2025-07-27 15:33:39
maintainerNone
docs_urlNone
authorsog4be <163720533+sog4be@users.noreply.github.com>
requires_python>=3.9
licenseMIT
keywords nlp sentence boundary detection tokenization parallel
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Sakurs Python Bindings

High-performance sentence boundary detection for Python using the Delta-Stack Monoid algorithm.

## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [API Reference](#api-reference)
  - [Functions](#functions)
  - [Classes](#classes)
- [Supported Languages](#supported-languages)
- [Performance Tips](#performance-tips)
- [Benchmarks](#benchmarks)
- [Error Handling](#error-handling)
- [Development](#development)
  - [Building from Source](#building-from-source)
  - [Development Workflow](#development-workflow)
  - [Troubleshooting](#troubleshooting)
- [License](#license)

## Installation

**Coming Soon** - Will be available on PyPI.

For now, build from source:
```bash
git clone https://github.com/sog4be/sakurs.git
cd sakurs/sakurs-py
uv pip install -e .
```

**Requirements**: Python 3.9 or later

## Quick Start

```python
import sakurs

# Simple sentence splitting
sentences = sakurs.split("Hello world. This is a test.")
print(sentences)  # ['Hello world.', 'This is a test.']

# Process files directly
sentences = sakurs.split("document.txt")  # Path as string
sentences = sakurs.split(Path("document.txt"))  # Path object

# Specify language
sentences = sakurs.split("これは日本語です。テストです。", language="ja")
print(sentences)  # ['これは日本語です。', 'テストです。']

# Get detailed output with offsets
results = sakurs.split(text, return_details=True)
for sentence in results:
    print(f"{sentence.text} [{sentence.start}:{sentence.end}]")

# Memory-efficient processing for large files
for sentence in sakurs.split_large_file("huge_corpus.txt", max_memory_mb=50):
    process(sentence)  # Process each sentence as it's found

# Responsive iteration (loads all, yields incrementally)  
for sentence in sakurs.iter_split("document.txt"):
    print(sentence)  # Get results as they're processed
```

## API Reference

### Table of Contents
- [Functions](#functions)
  - [`sakurs.split`](#sakurssplit)
  - [`sakurs.iter_split`](#sakursiter_split)
  - [`sakurs.split_large_file`](#sakurssplit_large_file)
  - [`sakurs.load`](#sakursload)
  - [`sakurs.supported_languages`](#sakurssupported_languages)
- [Classes](#classes)
  - [`SentenceSplitter`](#sakurssentencesplitter)
  - [`Sentence`](#sakurssentence)
  - [`LanguageConfig`](#sakurslanguageconfig)

### Functions

#### `sakurs.split`
Split text or file into sentences.

**Signature:**
```python
sakurs.split(
    input,
    *,
    language=None,
    language_config=None,
    threads=None,
    chunk_kb=None,
    parallel=False,
    execution_mode="adaptive",
    return_details=False,
    encoding="utf-8"
)
```

**Parameters:**
- `input` (str | Path | TextIO | BinaryIO): Text string, file path, or file-like object
- `language` (str, optional): Language code ("en", "ja")
- `language_config` (LanguageConfig, optional): Custom language configuration
- `threads` (int, optional): Number of threads (None for auto)
- `chunk_kb` (int, optional): Chunk size in KB (default: 256) for parallel processing (default: 256)
- `parallel` (bool): Force parallel processing even for small inputs
- `execution_mode` (str): "sequential", "parallel", or "adaptive" (default)
- `return_details` (bool): Return Sentence objects with metadata instead of strings
- `encoding` (str): Text encoding for file inputs (default: "utf-8")

**Returns:** List[str] or List[Sentence] if return_details=True

#### `sakurs.iter_split`
Process input and return sentences as an iterator. Loads entire input but yields incrementally.

**Signature:**
```python
sakurs.iter_split(
    input,
    *,
    language=None,
    language_config=None,
    threads=None,
    chunk_kb=None,
    encoding="utf-8"
)
```

**Parameters:** Same as `split()` except no `return_details` parameter

**Returns:** Iterator[str] - Iterator yielding sentences

#### `sakurs.split_large_file`
Process large files with limited memory usage.

**Signature:**
```python
sakurs.split_large_file(
    file_path,
    *,
    language=None,
    language_config=None,
    max_memory_mb=100,
    overlap_size=1024,
    encoding="utf-8"
)
```

**Parameters:**
- `file_path` (str | Path): Path to the file
- `language` (str, optional): Language code
- `language_config` (LanguageConfig, optional): Custom language configuration  
- `max_memory_mb` (int): Maximum memory to use in MB (default: 100)
- `overlap_size` (int): Bytes to overlap between chunks (default: 1024)
- `encoding` (str): File encoding (default: "utf-8")

**Returns:** Iterator[str] - Iterator yielding sentences

#### `sakurs.load`
Create a processor instance for repeated use.

**Signature:**
```python
sakurs.load(
    language,
    *,
    threads=None,
    chunk_kb=None,
    execution_mode="adaptive"
)
```

**Parameters:**
- `language` (str): Language code ("en" or "ja")
- `threads` (int, optional): Number of threads
- `chunk_kb` (int, optional): Chunk size in KB (default: 256)
- `execution_mode` (str): Processing mode

**Returns:** SentenceSplitter instance

#### `sakurs.supported_languages`
Get list of supported languages.

**Signature:**
```python
sakurs.supported_languages()
```

**Returns:** List[str] - Supported language codes

### Classes

#### `sakurs.SentenceSplitter`
Main sentence splitter class for sentence boundary detection.

**Constructor Parameters:**
- `language` (str, optional): Language code
- `language_config` (LanguageConfig, optional): Custom language configuration
- `threads` (int, optional): Number of threads
- `chunk_kb` (int, optional): Chunk size in KB (default: 256)
- `execution_mode` (str): "sequential", "parallel", or "adaptive"
- `streaming` (bool): Enable streaming mode configuration
- `stream_chunk_mb` (int): Chunk size in MB for streaming mode

**Methods:**
- `split(input, *, return_details=False, encoding="utf-8")`: Split text or file into sentences
- `iter_split(input, *, encoding="utf-8")`: Return iterator over sentences
- `__enter__()` / `__exit__()`: Context manager support

#### `sakurs.Sentence`
Sentence with metadata (returned when `return_details=True`).

**Attributes:**
- `text` (str): The sentence text
- `start` (int): Character offset of sentence start
- `end` (int): Character offset of sentence end
- `confidence` (float): Confidence score (default: 1.0)
- `metadata` (dict): Additional metadata

#### `sakurs.LanguageConfig`
Language configuration for custom rules.

**Class Methods:**
- `from_toml(path)`: Load configuration from TOML file
- `to_toml(path)`: Save configuration to TOML file

**Attributes:**
- `code` (str): Language code
- `name` (str): Language name
- `terminators` (TerminatorConfig): Sentence terminator rules
- `ellipsis` (EllipsisConfig): Ellipsis handling rules
- `abbreviations` (AbbreviationConfig): Abbreviation rules
- `enclosures` (EnclosureConfig): Enclosure (quotes, parentheses) rules
- `suppression` (SuppressionConfig): Pattern suppression rules

## Supported Languages

- English (`en`, `english`)
- Japanese (`ja`, `japanese`)

## Performance Tips

1. **Choose the right function for your use case**:
   ```python
   # For small to medium texts - use split()
   sentences = sakurs.split(text)
   
   # For responsive processing - use iter_split()
   for sentence in sakurs.iter_split(document):
       process_immediately(sentence)
   
   # For huge files with memory constraints - use split_large_file()
   for sentence in sakurs.split_large_file("10gb_corpus.txt", max_memory_mb=100):
       index_sentence(sentence)
   ```

2. **Reuse SentenceSplitter instances**: Create once, use many times
   ```python
   processor = sakurs.load("en", threads=4)
   for document in documents:
       sentences = processor.split(document)
   ```

3. **Configure for your workload**: 
   ```python
   # For CPU-bound batch processing
   processor = sakurs.load("en", threads=8, execution_mode="parallel")
   
   # For I/O-bound or interactive use
   processor = sakurs.load("en", threads=2, execution_mode="adaptive")
   
   # For memory-constrained environments
   processor = sakurs.SentenceSplitter(language="en", streaming=True, stream_chunk_mb=5)
   ```

4. **Adjust chunk size for document characteristics**:
   ```python
   # For texts with many short sentences
   sentences = sakurs.split(text, chunk_kb=64)
   
   # For texts with long sentences
   sentences = sakurs.split(text, chunk_kb=512)
   ```

## Benchmarks

Sakurs demonstrates significant performance improvements over existing Python sentence segmentation libraries. Benchmarks are run automatically in CI and results are displayed in GitHub Actions job summaries.

### Running Benchmarks Locally

To run performance benchmarks comparing sakurs with other libraries:

```bash
# Install benchmark dependencies
uv pip install -e ".[benchmark]"

# Run all benchmarks
pytest benchmarks/ --benchmark-only

# Run specific language benchmarks
pytest benchmarks/test_benchmark_english.py --benchmark-only
pytest benchmarks/test_benchmark_japanese.py --benchmark-only
```

### Benchmark Libraries

- **English**: Compared against [PySBD](https://github.com/nipunsadvilkar/pySBD)
- **Japanese**: Compared against [ja_sentence_segmenter](https://github.com/wwwcojp/ja_sentence_segmenter)

## Error Handling

```python
import sakurs

# Language errors
try:
    processor = sakurs.load("unsupported_language")
except sakurs.InvalidLanguageError as e:
    print(f"Language error: {e}")

# File errors
try:
    sentences = sakurs.split("nonexistent.txt")
except sakurs.FileNotFoundError as e:
    print(f"File error: {e}")

# Configuration errors
try:
    config = sakurs.LanguageConfig.from_toml("invalid.toml")
except sakurs.ConfigurationError as e:
    print(f"Config error: {e}")

# The library handles edge cases gracefully
sentences = sakurs.split("")  # Returns []
sentences = sakurs.split("No punctuation")  # Returns ["No punctuation"]
```

## Development

This package is built with PyO3 and maturin.

### Building from Source

For development, we recommend building and installing wheels rather than using editable installs:

```bash
# Build the wheel
maturin build --release --features extension-module

# Install the wheel (force reinstall to ensure updates)
uv pip install --force-reinstall target/wheels/*.whl
```

**Important Note**: Avoid using `pip install -e .` or `maturin develop` as they can lead to stale binaries that don't reflect Rust code changes. The editable install mechanism doesn't properly track changes in the compiled Rust extension module.

### Development Workflow

1. Make changes to the Rust code
2. Build the wheel: `maturin build --release --features extension-module`
3. Install the wheel: `uv pip install --force-reinstall target/wheels/*.whl`
4. Run tests: `python -m pytest tests/`

For convenience, you can use the Makefile from the project root:
```bash
make py-dev  # Builds and installs the wheel
make py-test # Builds, installs, and runs tests
```

### Troubleshooting

If your changes aren't reflected after rebuilding:
- Check if you have an editable install: `uv pip show sakurs` (look for "Editable project location")
- Uninstall completely: `uv pip uninstall sakurs -y`
- Reinstall from wheel as shown above
- Use `.venv/bin/python` directly instead of `uv run` to avoid automatic editable install restoration

## License

MIT License - see LICENSE file for details.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/sog4be/sakurs",
    "name": "sakurs",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "nlp, sentence, boundary, detection, tokenization, parallel",
    "author": "sog4be <163720533+sog4be@users.noreply.github.com>",
    "author_email": "Ryo Sogabe <163720533+sog4be@users.noreply.github.com>",
    "download_url": null,
    "platform": null,
    "description": "# Sakurs Python Bindings\n\nHigh-performance sentence boundary detection for Python using the Delta-Stack Monoid algorithm.\n\n## Table of Contents\n\n- [Installation](#installation)\n- [Quick Start](#quick-start)\n- [API Reference](#api-reference)\n  - [Functions](#functions)\n  - [Classes](#classes)\n- [Supported Languages](#supported-languages)\n- [Performance Tips](#performance-tips)\n- [Benchmarks](#benchmarks)\n- [Error Handling](#error-handling)\n- [Development](#development)\n  - [Building from Source](#building-from-source)\n  - [Development Workflow](#development-workflow)\n  - [Troubleshooting](#troubleshooting)\n- [License](#license)\n\n## Installation\n\n**Coming Soon** - Will be available on PyPI.\n\nFor now, build from source:\n```bash\ngit clone https://github.com/sog4be/sakurs.git\ncd sakurs/sakurs-py\nuv pip install -e .\n```\n\n**Requirements**: Python 3.9 or later\n\n## Quick Start\n\n```python\nimport sakurs\n\n# Simple sentence splitting\nsentences = sakurs.split(\"Hello world. This is a test.\")\nprint(sentences)  # ['Hello world.', 'This is a test.']\n\n# Process files directly\nsentences = sakurs.split(\"document.txt\")  # Path as string\nsentences = sakurs.split(Path(\"document.txt\"))  # Path object\n\n# Specify language\nsentences = sakurs.split(\"\u3053\u308c\u306f\u65e5\u672c\u8a9e\u3067\u3059\u3002\u30c6\u30b9\u30c8\u3067\u3059\u3002\", language=\"ja\")\nprint(sentences)  # ['\u3053\u308c\u306f\u65e5\u672c\u8a9e\u3067\u3059\u3002', '\u30c6\u30b9\u30c8\u3067\u3059\u3002']\n\n# Get detailed output with offsets\nresults = sakurs.split(text, return_details=True)\nfor sentence in results:\n    print(f\"{sentence.text} [{sentence.start}:{sentence.end}]\")\n\n# Memory-efficient processing for large files\nfor sentence in sakurs.split_large_file(\"huge_corpus.txt\", max_memory_mb=50):\n    process(sentence)  # Process each sentence as it's found\n\n# Responsive iteration (loads all, yields incrementally)  \nfor sentence in sakurs.iter_split(\"document.txt\"):\n    print(sentence)  # Get results as they're processed\n```\n\n## API Reference\n\n### Table of Contents\n- [Functions](#functions)\n  - [`sakurs.split`](#sakurssplit)\n  - [`sakurs.iter_split`](#sakursiter_split)\n  - [`sakurs.split_large_file`](#sakurssplit_large_file)\n  - [`sakurs.load`](#sakursload)\n  - [`sakurs.supported_languages`](#sakurssupported_languages)\n- [Classes](#classes)\n  - [`SentenceSplitter`](#sakurssentencesplitter)\n  - [`Sentence`](#sakurssentence)\n  - [`LanguageConfig`](#sakurslanguageconfig)\n\n### Functions\n\n#### `sakurs.split`\nSplit text or file into sentences.\n\n**Signature:**\n```python\nsakurs.split(\n    input,\n    *,\n    language=None,\n    language_config=None,\n    threads=None,\n    chunk_kb=None,\n    parallel=False,\n    execution_mode=\"adaptive\",\n    return_details=False,\n    encoding=\"utf-8\"\n)\n```\n\n**Parameters:**\n- `input` (str | Path | TextIO | BinaryIO): Text string, file path, or file-like object\n- `language` (str, optional): Language code (\"en\", \"ja\")\n- `language_config` (LanguageConfig, optional): Custom language configuration\n- `threads` (int, optional): Number of threads (None for auto)\n- `chunk_kb` (int, optional): Chunk size in KB (default: 256) for parallel processing (default: 256)\n- `parallel` (bool): Force parallel processing even for small inputs\n- `execution_mode` (str): \"sequential\", \"parallel\", or \"adaptive\" (default)\n- `return_details` (bool): Return Sentence objects with metadata instead of strings\n- `encoding` (str): Text encoding for file inputs (default: \"utf-8\")\n\n**Returns:** List[str] or List[Sentence] if return_details=True\n\n#### `sakurs.iter_split`\nProcess input and return sentences as an iterator. Loads entire input but yields incrementally.\n\n**Signature:**\n```python\nsakurs.iter_split(\n    input,\n    *,\n    language=None,\n    language_config=None,\n    threads=None,\n    chunk_kb=None,\n    encoding=\"utf-8\"\n)\n```\n\n**Parameters:** Same as `split()` except no `return_details` parameter\n\n**Returns:** Iterator[str] - Iterator yielding sentences\n\n#### `sakurs.split_large_file`\nProcess large files with limited memory usage.\n\n**Signature:**\n```python\nsakurs.split_large_file(\n    file_path,\n    *,\n    language=None,\n    language_config=None,\n    max_memory_mb=100,\n    overlap_size=1024,\n    encoding=\"utf-8\"\n)\n```\n\n**Parameters:**\n- `file_path` (str | Path): Path to the file\n- `language` (str, optional): Language code\n- `language_config` (LanguageConfig, optional): Custom language configuration  \n- `max_memory_mb` (int): Maximum memory to use in MB (default: 100)\n- `overlap_size` (int): Bytes to overlap between chunks (default: 1024)\n- `encoding` (str): File encoding (default: \"utf-8\")\n\n**Returns:** Iterator[str] - Iterator yielding sentences\n\n#### `sakurs.load`\nCreate a processor instance for repeated use.\n\n**Signature:**\n```python\nsakurs.load(\n    language,\n    *,\n    threads=None,\n    chunk_kb=None,\n    execution_mode=\"adaptive\"\n)\n```\n\n**Parameters:**\n- `language` (str): Language code (\"en\" or \"ja\")\n- `threads` (int, optional): Number of threads\n- `chunk_kb` (int, optional): Chunk size in KB (default: 256)\n- `execution_mode` (str): Processing mode\n\n**Returns:** SentenceSplitter instance\n\n#### `sakurs.supported_languages`\nGet list of supported languages.\n\n**Signature:**\n```python\nsakurs.supported_languages()\n```\n\n**Returns:** List[str] - Supported language codes\n\n### Classes\n\n#### `sakurs.SentenceSplitter`\nMain sentence splitter class for sentence boundary detection.\n\n**Constructor Parameters:**\n- `language` (str, optional): Language code\n- `language_config` (LanguageConfig, optional): Custom language configuration\n- `threads` (int, optional): Number of threads\n- `chunk_kb` (int, optional): Chunk size in KB (default: 256)\n- `execution_mode` (str): \"sequential\", \"parallel\", or \"adaptive\"\n- `streaming` (bool): Enable streaming mode configuration\n- `stream_chunk_mb` (int): Chunk size in MB for streaming mode\n\n**Methods:**\n- `split(input, *, return_details=False, encoding=\"utf-8\")`: Split text or file into sentences\n- `iter_split(input, *, encoding=\"utf-8\")`: Return iterator over sentences\n- `__enter__()` / `__exit__()`: Context manager support\n\n#### `sakurs.Sentence`\nSentence with metadata (returned when `return_details=True`).\n\n**Attributes:**\n- `text` (str): The sentence text\n- `start` (int): Character offset of sentence start\n- `end` (int): Character offset of sentence end\n- `confidence` (float): Confidence score (default: 1.0)\n- `metadata` (dict): Additional metadata\n\n#### `sakurs.LanguageConfig`\nLanguage configuration for custom rules.\n\n**Class Methods:**\n- `from_toml(path)`: Load configuration from TOML file\n- `to_toml(path)`: Save configuration to TOML file\n\n**Attributes:**\n- `code` (str): Language code\n- `name` (str): Language name\n- `terminators` (TerminatorConfig): Sentence terminator rules\n- `ellipsis` (EllipsisConfig): Ellipsis handling rules\n- `abbreviations` (AbbreviationConfig): Abbreviation rules\n- `enclosures` (EnclosureConfig): Enclosure (quotes, parentheses) rules\n- `suppression` (SuppressionConfig): Pattern suppression rules\n\n## Supported Languages\n\n- English (`en`, `english`)\n- Japanese (`ja`, `japanese`)\n\n## Performance Tips\n\n1. **Choose the right function for your use case**:\n   ```python\n   # For small to medium texts - use split()\n   sentences = sakurs.split(text)\n   \n   # For responsive processing - use iter_split()\n   for sentence in sakurs.iter_split(document):\n       process_immediately(sentence)\n   \n   # For huge files with memory constraints - use split_large_file()\n   for sentence in sakurs.split_large_file(\"10gb_corpus.txt\", max_memory_mb=100):\n       index_sentence(sentence)\n   ```\n\n2. **Reuse SentenceSplitter instances**: Create once, use many times\n   ```python\n   processor = sakurs.load(\"en\", threads=4)\n   for document in documents:\n       sentences = processor.split(document)\n   ```\n\n3. **Configure for your workload**: \n   ```python\n   # For CPU-bound batch processing\n   processor = sakurs.load(\"en\", threads=8, execution_mode=\"parallel\")\n   \n   # For I/O-bound or interactive use\n   processor = sakurs.load(\"en\", threads=2, execution_mode=\"adaptive\")\n   \n   # For memory-constrained environments\n   processor = sakurs.SentenceSplitter(language=\"en\", streaming=True, stream_chunk_mb=5)\n   ```\n\n4. **Adjust chunk size for document characteristics**:\n   ```python\n   # For texts with many short sentences\n   sentences = sakurs.split(text, chunk_kb=64)\n   \n   # For texts with long sentences\n   sentences = sakurs.split(text, chunk_kb=512)\n   ```\n\n## Benchmarks\n\nSakurs demonstrates significant performance improvements over existing Python sentence segmentation libraries. Benchmarks are run automatically in CI and results are displayed in GitHub Actions job summaries.\n\n### Running Benchmarks Locally\n\nTo run performance benchmarks comparing sakurs with other libraries:\n\n```bash\n# Install benchmark dependencies\nuv pip install -e \".[benchmark]\"\n\n# Run all benchmarks\npytest benchmarks/ --benchmark-only\n\n# Run specific language benchmarks\npytest benchmarks/test_benchmark_english.py --benchmark-only\npytest benchmarks/test_benchmark_japanese.py --benchmark-only\n```\n\n### Benchmark Libraries\n\n- **English**: Compared against [PySBD](https://github.com/nipunsadvilkar/pySBD)\n- **Japanese**: Compared against [ja_sentence_segmenter](https://github.com/wwwcojp/ja_sentence_segmenter)\n\n## Error Handling\n\n```python\nimport sakurs\n\n# Language errors\ntry:\n    processor = sakurs.load(\"unsupported_language\")\nexcept sakurs.InvalidLanguageError as e:\n    print(f\"Language error: {e}\")\n\n# File errors\ntry:\n    sentences = sakurs.split(\"nonexistent.txt\")\nexcept sakurs.FileNotFoundError as e:\n    print(f\"File error: {e}\")\n\n# Configuration errors\ntry:\n    config = sakurs.LanguageConfig.from_toml(\"invalid.toml\")\nexcept sakurs.ConfigurationError as e:\n    print(f\"Config error: {e}\")\n\n# The library handles edge cases gracefully\nsentences = sakurs.split(\"\")  # Returns []\nsentences = sakurs.split(\"No punctuation\")  # Returns [\"No punctuation\"]\n```\n\n## Development\n\nThis package is built with PyO3 and maturin.\n\n### Building from Source\n\nFor development, we recommend building and installing wheels rather than using editable installs:\n\n```bash\n# Build the wheel\nmaturin build --release --features extension-module\n\n# Install the wheel (force reinstall to ensure updates)\nuv pip install --force-reinstall target/wheels/*.whl\n```\n\n**Important Note**: Avoid using `pip install -e .` or `maturin develop` as they can lead to stale binaries that don't reflect Rust code changes. The editable install mechanism doesn't properly track changes in the compiled Rust extension module.\n\n### Development Workflow\n\n1. Make changes to the Rust code\n2. Build the wheel: `maturin build --release --features extension-module`\n3. Install the wheel: `uv pip install --force-reinstall target/wheels/*.whl`\n4. Run tests: `python -m pytest tests/`\n\nFor convenience, you can use the Makefile from the project root:\n```bash\nmake py-dev  # Builds and installs the wheel\nmake py-test # Builds, installs, and runs tests\n```\n\n### Troubleshooting\n\nIf your changes aren't reflected after rebuilding:\n- Check if you have an editable install: `uv pip show sakurs` (look for \"Editable project location\")\n- Uninstall completely: `uv pip uninstall sakurs -y`\n- Reinstall from wheel as shown above\n- Use `.venv/bin/python` directly instead of `uv run` to avoid automatic editable install restoration\n\n## License\n\nMIT License - see LICENSE file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Fast, parallel sentence boundary detection using Delta-Stack Monoid algorithm",
    "version": "0.1.1",
    "project_urls": {
        "Documentation": "https://github.com/sog4be/sakurs#readme",
        "Homepage": "https://github.com/sog4be/sakurs",
        "Issues": "https://github.com/sog4be/sakurs/issues",
        "Repository": "https://github.com/sog4be/sakurs"
    },
    "split_keywords": [
        "nlp",
        " sentence",
        " boundary",
        " detection",
        " tokenization",
        " parallel"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "66bdc377a4e1fefab9c186248b3411b1d2f98605eb2f2c3d8e2f9342aef4daea",
                "md5": "273a630adef5beb3647dcee14d827cad",
                "sha256": "772a4c110824e1e1b0b854bfb3bf1f1483eafc32b26fdbfb85ff857e5592d5d6"
            },
            "downloads": -1,
            "filename": "sakurs-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl",
            "has_sig": false,
            "md5_digest": "273a630adef5beb3647dcee14d827cad",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1661999,
            "upload_time": "2025-07-27T15:33:39",
            "upload_time_iso_8601": "2025-07-27T15:33:39.734545Z",
            "url": "https://files.pythonhosted.org/packages/66/bd/c377a4e1fefab9c186248b3411b1d2f98605eb2f2c3d8e2f9342aef4daea/sakurs-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2227904d8778604700d313a323c5d0fb9c3dedf5f5b9a2e7347f623772923618",
                "md5": "8c47344859b5c2aa375baebc021365fe",
                "sha256": "42e05f07250e4dda4be0bc85dac9d86e1ee489a149a2384d108319759c87cc9f"
            },
            "downloads": -1,
            "filename": "sakurs-0.1.1-cp39-abi3-macosx_11_0_arm64.whl",
            "has_sig": false,
            "md5_digest": "8c47344859b5c2aa375baebc021365fe",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1569320,
            "upload_time": "2025-07-27T15:33:41",
            "upload_time_iso_8601": "2025-07-27T15:33:41.587524Z",
            "url": "https://files.pythonhosted.org/packages/22/27/904d8778604700d313a323c5d0fb9c3dedf5f5b9a2e7347f623772923618/sakurs-0.1.1-cp39-abi3-macosx_11_0_arm64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "4b220e71ea68d1c9e5680fdced717e86afb44fb04f12ee62e1e04cbafffdb2e3",
                "md5": "dd1d5f0eb3e2bcb570bb381d82af3b29",
                "sha256": "901d55d48fe9aadccf4ec4f7a3b12f471e115b4f44f7d0e2e4bdbc72ad5f6fb0"
            },
            "downloads": -1,
            "filename": "sakurs-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "has_sig": false,
            "md5_digest": "dd1d5f0eb3e2bcb570bb381d82af3b29",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1811187,
            "upload_time": "2025-07-27T15:33:43",
            "upload_time_iso_8601": "2025-07-27T15:33:43.310163Z",
            "url": "https://files.pythonhosted.org/packages/4b/22/0e71ea68d1c9e5680fdced717e86afb44fb04f12ee62e1e04cbafffdb2e3/sakurs-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8402ae6f7544be4ef64fb8c2cf07a54f72a95de3ccb5e9867fa79b3fe1d90ff2",
                "md5": "c24ee49b356ca29687e412ec00739b0c",
                "sha256": "d38a36077bd9b76028f48c497634e1323197c9acfd09b439b23e3ccfe4b24977"
            },
            "downloads": -1,
            "filename": "sakurs-0.1.1-cp39-abi3-win_amd64.whl",
            "has_sig": false,
            "md5_digest": "c24ee49b356ca29687e412ec00739b0c",
            "packagetype": "bdist_wheel",
            "python_version": "cp39",
            "requires_python": ">=3.9",
            "size": 1431002,
            "upload_time": "2025-07-27T15:33:45",
            "upload_time_iso_8601": "2025-07-27T15:33:45.656175Z",
            "url": "https://files.pythonhosted.org/packages/84/02/ae6f7544be4ef64fb8c2cf07a54f72a95de3ccb5e9867fa79b3fe1d90ff2/sakurs-0.1.1-cp39-abi3-win_amd64.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-07-27 15:33:39",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "sog4be",
    "github_project": "sakurs",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sakurs"
}
        
Elapsed time: 1.93012s