sluggi

Name	sluggi JSON
Version	0.1.2 JSON
	download
home_page	None
Summary	A modern, high-performance Python library for turning text into clean, URL-safe slugs.
upload_time	2025-10-13 20:40:57
maintainer	None
docs_url	None
author	atillaguzel
requires_python	>=3.9
license	MIT
keywords	slug slugify url text python unicode transliteration cli
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # sluggi

**sluggi** — The modern, blazing-fast Python library and CLI for turning any text into clean, URL-safe slugs.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/sluggi.svg?logo=pypi)](https://pypi.org/project/sluggi/)
[![CI](https://github.com/blip-box/sluggi/actions/workflows/ci.yml/badge.svg)](https://github.com/blip-box/sluggi/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/blip-box/sluggi/branch/main/graph/badge.svg)](https://codecov.io/gh/blip-box/sluggi)
[![Python Version](https://img.shields.io/pypi/pyversions/sluggi.svg)](https://pypi.org/project/sluggi/)
[![Changelog](https://img.shields.io/badge/changelog-md-blue)](https://github.com/blip-box/sluggi/releases)



> Inspired by slugify, reimagined for speed, Unicode, and robust parallel batch processing.

---

## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [API Reference](#api-reference)
- [Advanced Usage & Performance Tips](#advanced-usage--performance-tips)
- [Command-Line Interface (CLI)](#command-line-interface-cli)
- [Development & Contributing](#development--contributing)
- [Performance & Benchmarks](#performance--benchmarks)
- [License](#license)
- [See Also](#see-also)

---

## Features
- 🚀 **Fast:** Optimized for speed with minimal dependencies.
- 🌍 **Unicode & Emoji:** Handles dozens of scripts, emoji, and edge cases out of the box.
- 🔧 **Customizable:** Define your own character mappings and rules.
- 🧵 **Parallel Batch:** True multi-core batch slugification (thread/process/serial modes).
- ⚡ **Async Support:** Full asyncio-compatible API for modern Python apps.
- 🖥️ **CLI Tool:** Powerful, colorized CLI for quick slug generation and batch jobs.
- 🔒 **Safe Output:** Always generates URL-safe, predictable slugs.
- 🧩 **Extensible API:** Easy to use and extend.
- ✅ **CI & Pre-commit:** Linting, formatting, and tests run automatically.

## Modular Slugification Pipeline

sluggi processes text through a modular pipeline of single-responsibility functions, making the codebase more readable, maintainable, and extensible. Each step in the pipeline performs a distinct transformation, allowing for easy customization and extension.

**Pipeline Steps:**

1. **normalize_unicode(text)**
   Normalize Unicode characters to a canonical form (NFKC).
2. **decode_html_entities_and_refs(text)**
   Decode HTML entities and character references to their Unicode equivalents.
3. **convert_emojis(text)**
   Replace emojis with their textual representations.
4. **transliterate_text(text)**
   Transliterate non-ASCII characters to ASCII (where possible).
5. **apply_custom_replacements(text, custom_map)**
   Apply user-defined or staged character/string replacements.
6. **extract_words(text, word_regex)**
   Extract words using a customizable regex pattern.
7. **filter_stopwords(words, stopwords)**
   Remove unwanted words (e.g., stopwords) from the list.
8. **join_words(words, separator)**
   Join words using the specified separator.
9. **to_lowercase(text, lowercase)**
   Convert the result to lowercase if requested.
10. **strip_separators(text, separator)**
    Remove leading/trailing separators.
11. **smart_truncate(text, max_length, separator)**
    Optionally truncate the slug at a word boundary.

**Processing Flow:**

    Input Text
      ↓
    normalize_unicode
      ↓
    decode_html_entities_and_refs
      ↓
    convert_emojis
      ↓
    transliterate_text
      ↓
    apply_custom_replacements
      ↓
    extract_words
      ↓
    filter_stopwords
      ↓
    join_words
      ↓
    to_lowercase
      ↓
    strip_separators
      ↓
    smart_truncate
      ↓
    Final Slug

This modular approach makes it easy to add, remove, or modify steps in the pipeline. Each function is pure and well-documented. See the API docs and source for details on customizing or extending the pipeline.

## Installation
Install from PyPI:
```bash
pip install sluggi
```

For CLI and development:
```bash
pip install .[cli,dev]
```

## Usage
```python
from sluggi import slugify, batch_slugify

slug = slugify("Hello, world!")
print(slug)  # hello-world

# Batch processing (parallel by default)
slugs = batch_slugify(["Hello, world!", "Привет мир"])
print(slugs)  # ['hello-world', 'privet-mir']

# Advanced: Parallel processing
slugs = batch_slugify(["foo", "bar"], parallel=True, mode="process", workers=2)

# Stopwords (exclude common words from slugs)
slug = slugify("The quick brown fox jumps", stopwords=["the", "fox"])
print(slug)  # quick-brown-jumps

slugs = batch_slugify([
    "The quick brown fox jumps",
    "Jump over the lazy dog"
], stopwords=["the", "over", "dog"])
print(slugs)  # ['quick-brown-fox-jumps', 'jump-lazy']

# Custom regex pattern for word extraction (e.g., only extract capitalized words)
slug = slugify("The Quick Brown Fox", word_regex=r"[A-Z][a-z]+")
print(slug)  # The-Quick-Brown-Fox

# Use in batch_slugify
slugs = batch_slugify([
    "The Quick Brown Fox",
    "Jump Over The Lazy Dog"
], word_regex=r"[A-Z][a-z]+")
print(slugs)  # ['The-Quick-Brown-Fox', 'Jump-Over-The-Lazy-Dog']
```

### Async Usage
Requires Python 3.7+
```python
import asyncio
from sluggi import async_slugify, async_batch_slugify

async def main():
    slug = await async_slugify("Hello, world!")
    slugs = await async_batch_slugify(["Hello, world!", "Привет мир"], parallel=True)
    print(slug)   # hello-world
    print(slugs)  # ['hello-world', 'privet-mir']

asyncio.run(main())
```

### Custom Separator
```python
slug = slugify("Hello, world!", separator="_")
print(slug)  # hello_world
```

### Stopwords
```python
slug = slugify("The quick brown fox", stopwords=["the", "fox"])
print(slug)  # quick-brown
```

### Custom Mapping
```python
slug = slugify("ä ö ü", custom_map={"ä": "ae", "ö": "oe", "ü": "ue"})
print(slug)  # ae-oe-ue
```

---

## API Reference

### `slugify`

| Argument      | Type           | Default   | Description                                                                 |
|---------------|----------------|-----------|-----------------------------------------------------------------------------|
| text          | str            | —         | The input string to slugify.                                                |
| separator     | str            | "-"      | Word separator in the slug.                                                 |
| custom_map    | dict           | None      | Custom character mappings.                                                  |
| stopwords     | Iterable[str]  | None      | Words to exclude from the slug (case-insensitive if `lowercase=True`).      |
| lowercase     | bool           | True      | Convert result to lowercase.                                                |
| word_regex    | str            | None      | Custom regex pattern for word extraction (default: `r'\w+'`).              |
| process_emoji | bool           | True      | If `False`, disables emoji-to-name conversion for max performance.          |

**Returns:** `str` (slugified string)

**Example:**
```python
slug = slugify("The quick brown fox", stopwords=["the", "fox"])
print(slug)  # quick-brown
```

---

### `batch_slugify`

| Argument      | Type           | Default      | Description                                                         |
|---------------|----------------|--------------|---------------------------------------------------------------------|
| texts         | Iterable[str]  | —            | List of strings to slugify.                                         |
| separator     | str            | "-"         | Word separator in the slug.                                         |
| custom_map    | dict           | None         | Custom character mappings.                                          |
| stopwords     | Iterable[str]  | None         | Words to exclude from slugs.                                        |
| lowercase     | bool           | True         | Convert result to lowercase.                                        |
| word_regex    | str            | None         | Custom regex pattern for word extraction (default: `r'\w+'`).      |
| parallel      | bool           | False        | Enable parallel processing.                                         |
| workers       | int            | None         | Number of parallel workers.                                         |
| mode          | str            | "thread"    | "thread", "process", or "serial".                                 |
| chunk_size    | int            | 1000         | Number of items per worker chunk.                                   |
| cache_size    | int            | 2048         | Size of the internal cache.                                         |

**Returns:** `List[str]` (list of slugified strings)

**Example:**
```python
slugs = batch_slugify(["The quick brown fox", "Jumped over the lazy dog"])
print(slugs)  # ['quick-brown', 'jumped-over-the-lazy-dog']
```

### `async_slugify(text, separator="-", custom_map=None)`
- Same as `slugify`, but async.

### `async_batch_slugify(texts, ...)`
- Same as `batch_slugify`, but async.

---

## Advanced Usage & Performance Tips

### Skipping Emoji Handling for Maximum Speed
- By default, sluggi converts emoji to their textual names (e.g., 😎 → smiley-face) for maximum compatibility and searchability.
- **For maximum performance**, you can disable emoji handling entirely if you do not need emoji-to-name conversion. This avoids all emoji detection and replacement logic, providing a measurable speedup for emoji-heavy or large datasets.
- To disable emoji handling:
  - **Python API:** Pass `process_emoji=False` to `slugify`, `batch_slugify`, or any pipeline config.
  - **CLI:** Add the `--no-process-emoji` flag to your command.

**Example:**
```python
slug = slugify("emoji: 😎🤖🎉", process_emoji=False)
print(slug)  # emoji
```

```bash
sluggi slug "emoji: 😎🤖🎉" --no-process-emoji
# Output: emoji
```

### Batch and Async Performance
- **Parallel Processing:**
  - For large batches, use `parallel=True` and tune `workers` and `chunk_size`.
  - `mode="process"` enables true CPU parallelism for CPU-bound workloads.
  - `mode="thread"` is best for I/O-bound or repeated/cached inputs.
- **Caching:**
  - Threaded mode enables slugification result caching for repeated or overlapping inputs.
  - Process mode disables cache (each process is isolated).
- **Asyncio:**
  - Use `async_batch_slugify` for async web servers or event-driven apps.
  - The `parallel` option with async batch uses a semaphore to limit concurrency, avoiding event loop starvation.
  - For best throughput, set `workers` to your CPU count or the number of concurrent requests you expect.

#### Example: Tuning Batch Processing
```python
# Large batch, CPU-bound: use process pool
slugs = batch_slugify(my_list, parallel=True, mode="process", workers=8, chunk_size=500)

# Async batch in a web API (FastAPI, Starlette, etc.)
from sluggi import async_batch_slugify

@app.post("/bulk-slugify")
async def bulk_slugify(payload: list[str]):
    return await async_batch_slugify(payload, parallel=True, workers=8)
```

### When to Use Serial vs Parallel vs Async
- **Serial:** Small batches, low latency, or single-threaded environments.
- **Parallel (thread/process):** Large batches, heavy CPU work, or when maximizing throughput is critical.
- **Async:** Integrate with modern async web frameworks, handle many concurrent requests, or avoid blocking the event loop.

See the docstrings and API reference for more details on each option.

## Command-Line Interface (CLI)

Install CLI dependencies:
```bash
pip install .[cli]
```

### Quick Start
```bash
sluggi slug "Γειά σου Κόσμε"
# Output: geia-sou-kosme

sluggi slug "The quick brown fox jumps" --stopwords "the,fox"
# Output: quick-brown-jumps

sluggi slug "The Quick Brown Fox" --word-regex "[A-Z][a-z]+"
# Output: The-Quick-Brown-Fox

sluggi slug "The Quick Brown Fox" --no-lowercase
# Output: The-Quick-Brown-Fox

sluggi batch --input names.txt --output slugs.txt
sluggi batch --input names.txt --word-regex "[A-Z][a-z]+" --no-lowercase

# Custom output formatting in batch mode:
sluggi batch --input names.txt --output-format "{line_num}: {original} -> {slug}"
# Output example:
# 1: Foo Bar -> foo-bar
# 2: Baz Qux -> baz-qux

sluggi batch --input names.txt --output-format "{slug}"
# Output: just the slug, as before

# Display results as a rich table in the console:
sluggi batch --input names.txt --display-output
# Output example (with rich):
# ┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━┓
# ┃ row_number ┃ original     ┃ slug     ┃
# ┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━┩
# │ 1          │ Foo Bar      │ foo-bar  │
# │ 2          │ Baz Qux      │ baz-qux  │
# └────────────┴──────────────┴──────────┘
```

**Supported placeholders for --output-format:**
- `{slug}`: The generated slug
- `{original}`: The original input line
- `{line_num}`: The 1-based line number

**Note:** The `--display-output` table uses the [rich](https://github.com/Textualize/rich) Python library. If not installed, a plain text table will be shown instead.

### CLI Options

| Option           | Description                                                      |
|------------------|------------------------------------------------------------------|
| `--separator`    | Separator for words in the slug (default: `-`).                 |
| `--stopwords`    | Comma-separated words to exclude from slug.                     |
| `--custom-map`   | Custom mapping as JSON, e.g. `'{"ä": "ae"}'`.                |
| `--word-regex`   | Custom regex pattern for word extraction (default: `\w+`).     |
| `--no-lowercase` | Preserve capitalization in the slug (default: False).           |
| `--output-format`| Custom output format for batch mode. Supports `{slug}`, `{original}`, `{line_num}`. Default: just the slug. |
| `--display-output`| Display results as a rich table in the console after batch processing. |


### CLI Help
```bash
sluggi --help
```

### Error Handling Example
```bash
sluggi batch --input missing.txt
# Output:
[bold red]Input file not found: missing.txt[/bold red]
```

## Development & Contributing

- Clone the repo:
  ```bash
  git clone https://github.com/blip-box/sluggi.git
  cd sluggi
  ```
- Create a virtual environment and install dependencies using uv:
  ```bash
  uv venv
  uv pip install .[dev,cli]
  ```
- Run tests and lints:
  ```bash
  pytest
  ruff src/sluggi tests
  black --check src/sluggi tests
  ```
- Pre-commit hooks:
  ```bash
  pre-commit install
  pre-commit run --all-files
  ```
- PRs and issues welcome!

## Encoding Notes
- Input and output files must be UTF-8 encoded.
- On Windows, use a UTF-8 capable terminal or set the environment variable `PYTHONUTF8=1` if you encounter encoding issues.

### Help and Examples
- Run `sluggi --help` or any subcommand with `--help` to see detailed usage and examples directly in your terminal.

---

## Performance & Benchmarks

Batch slugification performance was measured using the included benchmark script:

```bash
python scripts/benchmark_batch.py
```

**Results on 20,000 random strings:**

| Mode     | Time (s) | Avg ms/item |
|----------|----------|-------------|
| Serial   | 0.74     | 0.037       |
| Thread   | 0.62–0.72| 0.031–0.036 |
| Process  | 1.55–1.73| 0.078–0.086 |

- **Serial** is fast and reliable for most workloads.
- **Thread** mode may be slightly faster for I/O-bound or lightweight CPU tasks (default for --parallel).
- **Process** mode (multiprocessing) enables true CPU parallelism, but has higher overhead and is best for very CPU-bound or expensive slugification tasks.
- Use `--mode process` for multiprocessing, `--mode thread` for threads, or `--mode serial` for no parallelism. Combine with `--workers` to tune performance.

**Script location:** `scripts/benchmark_batch.py`

### Shell Completion
Enable tab-completion for your shell (bash, zsh, fish):
```bash
sluggi completion bash   # or zsh, fish
# Follow the printed instructions to enable completion in your shell
```

## License
MIT

---

[Changelog]([GitHub Releases](https://github.com/blip-box/sluggi/releases))

> **Note:** This project is a complete rewrite, inspired by existing slugify libraries, but aims to set a new standard for speed, correctness, and extensibility in Python.

---

### See Also

This project was inspired by the Java library [slugify by akullpp](https://github.com/akullpp/slugify). If you need Java or Gradle support, see their documentation for advanced transliteration and custom replacements.

Example (Java):
```java
final Slugify slg = Slugify.builder()
    .customReplacements(Map.of("Foo", "Hello", "bar", "world"))
    .customReplacement("Foo", "Hello")
    .customReplacement("bar", "world")
    .build();
final String result = slg.slugify("Foo, bar!");
// result: hello-world
```

For advanced transliteration in Java:
```groovy
capabilities {
    requireCapability('com.github.slugify:slugify-transliterator')
}
```
Or add the optional dependency `com.ibm.icu:icu4j` to your project.

---

✨ **New Automation & Collaboration Features**

- **Adaptive triage workflows**: Issues and PRs are now auto-labeled, parsed for agent/human status, and incomplete PRs are auto-closed for you—saving time for everyone.
- **Agent-ready templates**: All issue and PR templates are designed for both humans and autonomous agents, with structured metadata and feedback built in.
- **Playground workflow**: Safely experiment, test, or self-heal code with the new playground automation—perfect for bots and contributors alike.

See [.github/workflows/README.md](.github/workflows/README.md) for more details on these next-generation automations!

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "sluggi",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "slug, slugify, url, text, python, unicode, transliteration, cli",
    "author": "atillaguzel",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/de/7f/89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483/sluggi-0.1.2.tar.gz",
    "platform": null,
    "description": "# sluggi\n\n**sluggi** \u2014 The modern, blazing-fast Python library and CLI for turning any text into clean, URL-safe slugs.\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![PyPI](https://img.shields.io/pypi/v/sluggi.svg?logo=pypi)](https://pypi.org/project/sluggi/)\n[![CI](https://github.com/blip-box/sluggi/actions/workflows/ci.yml/badge.svg)](https://github.com/blip-box/sluggi/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/blip-box/sluggi/branch/main/graph/badge.svg)](https://codecov.io/gh/blip-box/sluggi)\n[![Python Version](https://img.shields.io/pypi/pyversions/sluggi.svg)](https://pypi.org/project/sluggi/)\n[![Changelog](https://img.shields.io/badge/changelog-md-blue)](https://github.com/blip-box/sluggi/releases)\n\n\n\n> Inspired by slugify, reimagined for speed, Unicode, and robust parallel batch processing.\n\n---\n\n## Table of Contents\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n- [API Reference](#api-reference)\n- [Advanced Usage & Performance Tips](#advanced-usage--performance-tips)\n- [Command-Line Interface (CLI)](#command-line-interface-cli)\n- [Development & Contributing](#development--contributing)\n- [Performance & Benchmarks](#performance--benchmarks)\n- [License](#license)\n- [See Also](#see-also)\n\n---\n\n## Features\n- \ud83d\ude80 **Fast:** Optimized for speed with minimal dependencies.\n- \ud83c\udf0d **Unicode & Emoji:** Handles dozens of scripts, emoji, and edge cases out of the box.\n- \ud83d\udd27 **Customizable:** Define your own character mappings and rules.\n- \ud83e\uddf5 **Parallel Batch:** True multi-core batch slugification (thread/process/serial modes).\n- \u26a1 **Async Support:** Full asyncio-compatible API for modern Python apps.\n- \ud83d\udda5\ufe0f **CLI Tool:** Powerful, colorized CLI for quick slug generation and batch jobs.\n- \ud83d\udd12 **Safe Output:** Always generates URL-safe, predictable slugs.\n- \ud83e\udde9 **Extensible API:** Easy to use and extend.\n- \u2705 **CI & Pre-commit:** Linting, formatting, and tests run automatically.\n\n## Modular Slugification Pipeline\n\nsluggi processes text through a modular pipeline of single-responsibility functions, making the codebase more readable, maintainable, and extensible. Each step in the pipeline performs a distinct transformation, allowing for easy customization and extension.\n\n**Pipeline Steps:**\n\n1. **normalize_unicode(text)**\n   Normalize Unicode characters to a canonical form (NFKC).\n2. **decode_html_entities_and_refs(text)**\n   Decode HTML entities and character references to their Unicode equivalents.\n3. **convert_emojis(text)**\n   Replace emojis with their textual representations.\n4. **transliterate_text(text)**\n   Transliterate non-ASCII characters to ASCII (where possible).\n5. **apply_custom_replacements(text, custom_map)**\n   Apply user-defined or staged character/string replacements.\n6. **extract_words(text, word_regex)**\n   Extract words using a customizable regex pattern.\n7. **filter_stopwords(words, stopwords)**\n   Remove unwanted words (e.g., stopwords) from the list.\n8. **join_words(words, separator)**\n   Join words using the specified separator.\n9. **to_lowercase(text, lowercase)**\n   Convert the result to lowercase if requested.\n10. **strip_separators(text, separator)**\n    Remove leading/trailing separators.\n11. **smart_truncate(text, max_length, separator)**\n    Optionally truncate the slug at a word boundary.\n\n**Processing Flow:**\n\n    Input Text\n      \u2193\n    normalize_unicode\n      \u2193\n    decode_html_entities_and_refs\n      \u2193\n    convert_emojis\n      \u2193\n    transliterate_text\n      \u2193\n    apply_custom_replacements\n      \u2193\n    extract_words\n      \u2193\n    filter_stopwords\n      \u2193\n    join_words\n      \u2193\n    to_lowercase\n      \u2193\n    strip_separators\n      \u2193\n    smart_truncate\n      \u2193\n    Final Slug\n\nThis modular approach makes it easy to add, remove, or modify steps in the pipeline. Each function is pure and well-documented. See the API docs and source for details on customizing or extending the pipeline.\n\n## Installation\nInstall from PyPI:\n```bash\npip install sluggi\n```\n\nFor CLI and development:\n```bash\npip install .[cli,dev]\n```\n\n## Usage\n```python\nfrom sluggi import slugify, batch_slugify\n\nslug = slugify(\"Hello, world!\")\nprint(slug)  # hello-world\n\n# Batch processing (parallel by default)\nslugs = batch_slugify([\"Hello, world!\", \"\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440\"])\nprint(slugs)  # ['hello-world', 'privet-mir']\n\n# Advanced: Parallel processing\nslugs = batch_slugify([\"foo\", \"bar\"], parallel=True, mode=\"process\", workers=2)\n\n# Stopwords (exclude common words from slugs)\nslug = slugify(\"The quick brown fox jumps\", stopwords=[\"the\", \"fox\"])\nprint(slug)  # quick-brown-jumps\n\nslugs = batch_slugify([\n    \"The quick brown fox jumps\",\n    \"Jump over the lazy dog\"\n], stopwords=[\"the\", \"over\", \"dog\"])\nprint(slugs)  # ['quick-brown-fox-jumps', 'jump-lazy']\n\n# Custom regex pattern for word extraction (e.g., only extract capitalized words)\nslug = slugify(\"The Quick Brown Fox\", word_regex=r\"[A-Z][a-z]+\")\nprint(slug)  # The-Quick-Brown-Fox\n\n# Use in batch_slugify\nslugs = batch_slugify([\n    \"The Quick Brown Fox\",\n    \"Jump Over The Lazy Dog\"\n], word_regex=r\"[A-Z][a-z]+\")\nprint(slugs)  # ['The-Quick-Brown-Fox', 'Jump-Over-The-Lazy-Dog']\n```\n\n### Async Usage\nRequires Python 3.7+\n```python\nimport asyncio\nfrom sluggi import async_slugify, async_batch_slugify\n\nasync def main():\n    slug = await async_slugify(\"Hello, world!\")\n    slugs = await async_batch_slugify([\"Hello, world!\", \"\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440\"], parallel=True)\n    print(slug)   # hello-world\n    print(slugs)  # ['hello-world', 'privet-mir']\n\nasyncio.run(main())\n```\n\n### Custom Separator\n```python\nslug = slugify(\"Hello, world!\", separator=\"_\")\nprint(slug)  # hello_world\n```\n\n### Stopwords\n```python\nslug = slugify(\"The quick brown fox\", stopwords=[\"the\", \"fox\"])\nprint(slug)  # quick-brown\n```\n\n### Custom Mapping\n```python\nslug = slugify(\"\u00e4 \u00f6 \u00fc\", custom_map={\"\u00e4\": \"ae\", \"\u00f6\": \"oe\", \"\u00fc\": \"ue\"})\nprint(slug)  # ae-oe-ue\n```\n\n---\n\n## API Reference\n\n### `slugify`\n\n| Argument      | Type           | Default   | Description                                                                 |\n|---------------|----------------|-----------|-----------------------------------------------------------------------------|\n| text          | str            | \u2014         | The input string to slugify.                                                |\n| separator     | str            | \"-\"      | Word separator in the slug.                                                 |\n| custom_map    | dict           | None      | Custom character mappings.                                                  |\n| stopwords     | Iterable[str]  | None      | Words to exclude from the slug (case-insensitive if `lowercase=True`).      |\n| lowercase     | bool           | True      | Convert result to lowercase.                                                |\n| word_regex    | str            | None      | Custom regex pattern for word extraction (default: `r'\\w+'`).              |\n| process_emoji | bool           | True      | If `False`, disables emoji-to-name conversion for max performance.          |\n\n**Returns:** `str` (slugified string)\n\n**Example:**\n```python\nslug = slugify(\"The quick brown fox\", stopwords=[\"the\", \"fox\"])\nprint(slug)  # quick-brown\n```\n\n---\n\n### `batch_slugify`\n\n| Argument      | Type           | Default      | Description                                                         |\n|---------------|----------------|--------------|---------------------------------------------------------------------|\n| texts         | Iterable[str]  | \u2014            | List of strings to slugify.                                         |\n| separator     | str            | \"-\"         | Word separator in the slug.                                         |\n| custom_map    | dict           | None         | Custom character mappings.                                          |\n| stopwords     | Iterable[str]  | None         | Words to exclude from slugs.                                        |\n| lowercase     | bool           | True         | Convert result to lowercase.                                        |\n| word_regex    | str            | None         | Custom regex pattern for word extraction (default: `r'\\w+'`).      |\n| parallel      | bool           | False        | Enable parallel processing.                                         |\n| workers       | int            | None         | Number of parallel workers.                                         |\n| mode          | str            | \"thread\"    | \"thread\", \"process\", or \"serial\".                                 |\n| chunk_size    | int            | 1000         | Number of items per worker chunk.                                   |\n| cache_size    | int            | 2048         | Size of the internal cache.                                         |\n\n**Returns:** `List[str]` (list of slugified strings)\n\n**Example:**\n```python\nslugs = batch_slugify([\"The quick brown fox\", \"Jumped over the lazy dog\"])\nprint(slugs)  # ['quick-brown', 'jumped-over-the-lazy-dog']\n```\n\n### `async_slugify(text, separator=\"-\", custom_map=None)`\n- Same as `slugify`, but async.\n\n### `async_batch_slugify(texts, ...)`\n- Same as `batch_slugify`, but async.\n\n---\n\n## Advanced Usage & Performance Tips\n\n### Skipping Emoji Handling for Maximum Speed\n- By default, sluggi converts emoji to their textual names (e.g., \ud83d\ude0e \u2192 smiley-face) for maximum compatibility and searchability.\n- **For maximum performance**, you can disable emoji handling entirely if you do not need emoji-to-name conversion. This avoids all emoji detection and replacement logic, providing a measurable speedup for emoji-heavy or large datasets.\n- To disable emoji handling:\n  - **Python API:** Pass `process_emoji=False` to `slugify`, `batch_slugify`, or any pipeline config.\n  - **CLI:** Add the `--no-process-emoji` flag to your command.\n\n**Example:**\n```python\nslug = slugify(\"emoji: \ud83d\ude0e\ud83e\udd16\ud83c\udf89\", process_emoji=False)\nprint(slug)  # emoji\n```\n\n```bash\nsluggi slug \"emoji: \ud83d\ude0e\ud83e\udd16\ud83c\udf89\" --no-process-emoji\n# Output: emoji\n```\n\n### Batch and Async Performance\n- **Parallel Processing:**\n  - For large batches, use `parallel=True` and tune `workers` and `chunk_size`.\n  - `mode=\"process\"` enables true CPU parallelism for CPU-bound workloads.\n  - `mode=\"thread\"` is best for I/O-bound or repeated/cached inputs.\n- **Caching:**\n  - Threaded mode enables slugification result caching for repeated or overlapping inputs.\n  - Process mode disables cache (each process is isolated).\n- **Asyncio:**\n  - Use `async_batch_slugify` for async web servers or event-driven apps.\n  - The `parallel` option with async batch uses a semaphore to limit concurrency, avoiding event loop starvation.\n  - For best throughput, set `workers` to your CPU count or the number of concurrent requests you expect.\n\n#### Example: Tuning Batch Processing\n```python\n# Large batch, CPU-bound: use process pool\nslugs = batch_slugify(my_list, parallel=True, mode=\"process\", workers=8, chunk_size=500)\n\n# Async batch in a web API (FastAPI, Starlette, etc.)\nfrom sluggi import async_batch_slugify\n\n@app.post(\"/bulk-slugify\")\nasync def bulk_slugify(payload: list[str]):\n    return await async_batch_slugify(payload, parallel=True, workers=8)\n```\n\n### When to Use Serial vs Parallel vs Async\n- **Serial:** Small batches, low latency, or single-threaded environments.\n- **Parallel (thread/process):** Large batches, heavy CPU work, or when maximizing throughput is critical.\n- **Async:** Integrate with modern async web frameworks, handle many concurrent requests, or avoid blocking the event loop.\n\nSee the docstrings and API reference for more details on each option.\n\n## Command-Line Interface (CLI)\n\nInstall CLI dependencies:\n```bash\npip install .[cli]\n```\n\n### Quick Start\n```bash\nsluggi slug \"\u0393\u03b5\u03b9\u03ac \u03c3\u03bf\u03c5 \u039a\u03cc\u03c3\u03bc\u03b5\"\n# Output: geia-sou-kosme\n\nsluggi slug \"The quick brown fox jumps\" --stopwords \"the,fox\"\n# Output: quick-brown-jumps\n\nsluggi slug \"The Quick Brown Fox\" --word-regex \"[A-Z][a-z]+\"\n# Output: The-Quick-Brown-Fox\n\nsluggi slug \"The Quick Brown Fox\" --no-lowercase\n# Output: The-Quick-Brown-Fox\n\nsluggi batch --input names.txt --output slugs.txt\nsluggi batch --input names.txt --word-regex \"[A-Z][a-z]+\" --no-lowercase\n\n# Custom output formatting in batch mode:\nsluggi batch --input names.txt --output-format \"{line_num}: {original} -> {slug}\"\n# Output example:\n# 1: Foo Bar -> foo-bar\n# 2: Baz Qux -> baz-qux\n\nsluggi batch --input names.txt --output-format \"{slug}\"\n# Output: just the slug, as before\n\n# Display results as a rich table in the console:\nsluggi batch --input names.txt --display-output\n# Output example (with rich):\n# \u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n# \u2503 row_number \u2503 original     \u2503 slug     \u2503\n# \u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n# \u2502 1          \u2502 Foo Bar      \u2502 foo-bar  \u2502\n# \u2502 2          \u2502 Baz Qux      \u2502 baz-qux  \u2502\n# \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Supported placeholders for --output-format:**\n- `{slug}`: The generated slug\n- `{original}`: The original input line\n- `{line_num}`: The 1-based line number\n\n**Note:** The `--display-output` table uses the [rich](https://github.com/Textualize/rich) Python library. If not installed, a plain text table will be shown instead.\n\n### CLI Options\n\n| Option           | Description                                                      |\n|------------------|------------------------------------------------------------------|\n| `--separator`    | Separator for words in the slug (default: `-`).                 |\n| `--stopwords`    | Comma-separated words to exclude from slug.                     |\n| `--custom-map`   | Custom mapping as JSON, e.g. `'{\"\u00e4\": \"ae\"}'`.                |\n| `--word-regex`   | Custom regex pattern for word extraction (default: `\\w+`).     |\n| `--no-lowercase` | Preserve capitalization in the slug (default: False).           |\n| `--output-format`| Custom output format for batch mode. Supports `{slug}`, `{original}`, `{line_num}`. Default: just the slug. |\n| `--display-output`| Display results as a rich table in the console after batch processing. |\n\n\n### CLI Help\n```bash\nsluggi --help\n```\n\n### Error Handling Example\n```bash\nsluggi batch --input missing.txt\n# Output:\n[bold red]Input file not found: missing.txt[/bold red]\n```\n\n## Development & Contributing\n\n- Clone the repo:\n  ```bash\n  git clone https://github.com/blip-box/sluggi.git\n  cd sluggi\n  ```\n- Create a virtual environment and install dependencies using uv:\n  ```bash\n  uv venv\n  uv pip install .[dev,cli]\n  ```\n- Run tests and lints:\n  ```bash\n  pytest\n  ruff src/sluggi tests\n  black --check src/sluggi tests\n  ```\n- Pre-commit hooks:\n  ```bash\n  pre-commit install\n  pre-commit run --all-files\n  ```\n- PRs and issues welcome!\n\n## Encoding Notes\n- Input and output files must be UTF-8 encoded.\n- On Windows, use a UTF-8 capable terminal or set the environment variable `PYTHONUTF8=1` if you encounter encoding issues.\n\n### Help and Examples\n- Run `sluggi --help` or any subcommand with `--help` to see detailed usage and examples directly in your terminal.\n\n---\n\n## Performance & Benchmarks\n\nBatch slugification performance was measured using the included benchmark script:\n\n```bash\npython scripts/benchmark_batch.py\n```\n\n**Results on 20,000 random strings:**\n\n| Mode     | Time (s) | Avg ms/item |\n|----------|----------|-------------|\n| Serial   | 0.74     | 0.037       |\n| Thread   | 0.62\u20130.72| 0.031\u20130.036 |\n| Process  | 1.55\u20131.73| 0.078\u20130.086 |\n\n- **Serial** is fast and reliable for most workloads.\n- **Thread** mode may be slightly faster for I/O-bound or lightweight CPU tasks (default for --parallel).\n- **Process** mode (multiprocessing) enables true CPU parallelism, but has higher overhead and is best for very CPU-bound or expensive slugification tasks.\n- Use `--mode process` for multiprocessing, `--mode thread` for threads, or `--mode serial` for no parallelism. Combine with `--workers` to tune performance.\n\n**Script location:** `scripts/benchmark_batch.py`\n\n### Shell Completion\nEnable tab-completion for your shell (bash, zsh, fish):\n```bash\nsluggi completion bash   # or zsh, fish\n# Follow the printed instructions to enable completion in your shell\n```\n\n## License\nMIT\n\n---\n\n[Changelog]([GitHub Releases](https://github.com/blip-box/sluggi/releases))\n\n> **Note:** This project is a complete rewrite, inspired by existing slugify libraries, but aims to set a new standard for speed, correctness, and extensibility in Python.\n\n---\n\n### See Also\n\nThis project was inspired by the Java library [slugify by akullpp](https://github.com/akullpp/slugify). If you need Java or Gradle support, see their documentation for advanced transliteration and custom replacements.\n\nExample (Java):\n```java\nfinal Slugify slg = Slugify.builder()\n    .customReplacements(Map.of(\"Foo\", \"Hello\", \"bar\", \"world\"))\n    .customReplacement(\"Foo\", \"Hello\")\n    .customReplacement(\"bar\", \"world\")\n    .build();\nfinal String result = slg.slugify(\"Foo, bar!\");\n// result: hello-world\n```\n\nFor advanced transliteration in Java:\n```groovy\ncapabilities {\n    requireCapability('com.github.slugify:slugify-transliterator')\n}\n```\nOr add the optional dependency `com.ibm.icu:icu4j` to your project.\n\n---\n\n\u2728 **New Automation & Collaboration Features**\n\n- **Adaptive triage workflows**: Issues and PRs are now auto-labeled, parsed for agent/human status, and incomplete PRs are auto-closed for you\u2014saving time for everyone.\n- **Agent-ready templates**: All issue and PR templates are designed for both humans and autonomous agents, with structured metadata and feedback built in.\n- **Playground workflow**: Safely experiment, test, or self-heal code with the new playground automation\u2014perfect for bots and contributors alike.\n\nSee [.github/workflows/README.md](.github/workflows/README.md) for more details on these next-generation automations!\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A modern, high-performance Python library for turning text into clean, URL-safe slugs.",
    "version": "0.1.2",
    "project_urls": {
        "Changelog": "https://github.com/blip-box/sluggi/releases",
        "Documentation": "https://github.com/blip-box/sluggi#readme",
        "Homepage": "https://github.com/blip-box/sluggi",
        "Issues": "https://github.com/blip-box/sluggi/issues",
        "Repository": "https://github.com/blip-box/sluggi"
    },
    "split_keywords": [
        "slug",
        " slugify",
        " url",
        " text",
        " python",
        " unicode",
        " transliteration",
        " cli"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8caf6ead89a4f9c2b1c1f14e813f6a00b9251fb443b17590ea031ae9f15f4143",
                "md5": "824305003c0f81b35190b73b23951aa6",
                "sha256": "b2d493194f57c488e72c57f149806c64b2458ec441005f826fd7cb10326b6417"
            },
            "downloads": -1,
            "filename": "sluggi-0.1.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "824305003c0f81b35190b73b23951aa6",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 28330,
            "upload_time": "2025-10-13T20:40:56",
            "upload_time_iso_8601": "2025-10-13T20:40:56.681496Z",
            "url": "https://files.pythonhosted.org/packages/8c/af/6ead89a4f9c2b1c1f14e813f6a00b9251fb443b17590ea031ae9f15f4143/sluggi-0.1.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "de7f89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483",
                "md5": "f12919b76da92ed08681b552024715ef",
                "sha256": "5ecaacca10697635782312b2be742deed3652265a303dfc2f67709b7d37fb539"
            },
            "downloads": -1,
            "filename": "sluggi-0.1.2.tar.gz",
            "has_sig": false,
            "md5_digest": "f12919b76da92ed08681b552024715ef",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 35670,
            "upload_time": "2025-10-13T20:40:57",
            "upload_time_iso_8601": "2025-10-13T20:40:57.896427Z",
            "url": "https://files.pythonhosted.org/packages/de/7f/89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483/sluggi-0.1.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-13 20:40:57",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "blip-box",
    "github_project": "sluggi",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "sluggi"
}

atillaguzel