# sluggi
**sluggi** β The modern, blazing-fast Python library and CLI for turning any text into clean, URL-safe slugs.
[](LICENSE)
[](https://pypi.org/project/sluggi/)
[](https://github.com/blip-box/sluggi/actions/workflows/ci.yml)
[](https://codecov.io/gh/blip-box/sluggi)
[](https://pypi.org/project/sluggi/)
[](https://github.com/blip-box/sluggi/releases)
> Inspired by slugify, reimagined for speed, Unicode, and robust parallel batch processing.
---
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
- [API Reference](#api-reference)
- [Advanced Usage & Performance Tips](#advanced-usage--performance-tips)
- [Command-Line Interface (CLI)](#command-line-interface-cli)
- [Development & Contributing](#development--contributing)
- [Performance & Benchmarks](#performance--benchmarks)
- [License](#license)
- [See Also](#see-also)
---
## Features
- π **Fast:** Optimized for speed with minimal dependencies.
- π **Unicode & Emoji:** Handles dozens of scripts, emoji, and edge cases out of the box.
- π§ **Customizable:** Define your own character mappings and rules.
- π§΅ **Parallel Batch:** True multi-core batch slugification (thread/process/serial modes).
- β‘ **Async Support:** Full asyncio-compatible API for modern Python apps.
- π₯οΈ **CLI Tool:** Powerful, colorized CLI for quick slug generation and batch jobs.
- π **Safe Output:** Always generates URL-safe, predictable slugs.
- π§© **Extensible API:** Easy to use and extend.
- β
**CI & Pre-commit:** Linting, formatting, and tests run automatically.
## Modular Slugification Pipeline
sluggi processes text through a modular pipeline of single-responsibility functions, making the codebase more readable, maintainable, and extensible. Each step in the pipeline performs a distinct transformation, allowing for easy customization and extension.
**Pipeline Steps:**
1. **normalize_unicode(text)**
Normalize Unicode characters to a canonical form (NFKC).
2. **decode_html_entities_and_refs(text)**
Decode HTML entities and character references to their Unicode equivalents.
3. **convert_emojis(text)**
Replace emojis with their textual representations.
4. **transliterate_text(text)**
Transliterate non-ASCII characters to ASCII (where possible).
5. **apply_custom_replacements(text, custom_map)**
Apply user-defined or staged character/string replacements.
6. **extract_words(text, word_regex)**
Extract words using a customizable regex pattern.
7. **filter_stopwords(words, stopwords)**
Remove unwanted words (e.g., stopwords) from the list.
8. **join_words(words, separator)**
Join words using the specified separator.
9. **to_lowercase(text, lowercase)**
Convert the result to lowercase if requested.
10. **strip_separators(text, separator)**
Remove leading/trailing separators.
11. **smart_truncate(text, max_length, separator)**
Optionally truncate the slug at a word boundary.
**Processing Flow:**
Input Text
β
normalize_unicode
β
decode_html_entities_and_refs
β
convert_emojis
β
transliterate_text
β
apply_custom_replacements
β
extract_words
β
filter_stopwords
β
join_words
β
to_lowercase
β
strip_separators
β
smart_truncate
β
Final Slug
This modular approach makes it easy to add, remove, or modify steps in the pipeline. Each function is pure and well-documented. See the API docs and source for details on customizing or extending the pipeline.
## Installation
Install from PyPI:
```bash
pip install sluggi
```
For CLI and development:
```bash
pip install .[cli,dev]
```
## Usage
```python
from sluggi import slugify, batch_slugify
slug = slugify("Hello, world!")
print(slug) # hello-world
# Batch processing (parallel by default)
slugs = batch_slugify(["Hello, world!", "ΠΡΠΈΠ²Π΅Ρ ΠΌΠΈΡ"])
print(slugs) # ['hello-world', 'privet-mir']
# Advanced: Parallel processing
slugs = batch_slugify(["foo", "bar"], parallel=True, mode="process", workers=2)
# Stopwords (exclude common words from slugs)
slug = slugify("The quick brown fox jumps", stopwords=["the", "fox"])
print(slug) # quick-brown-jumps
slugs = batch_slugify([
"The quick brown fox jumps",
"Jump over the lazy dog"
], stopwords=["the", "over", "dog"])
print(slugs) # ['quick-brown-fox-jumps', 'jump-lazy']
# Custom regex pattern for word extraction (e.g., only extract capitalized words)
slug = slugify("The Quick Brown Fox", word_regex=r"[A-Z][a-z]+")
print(slug) # The-Quick-Brown-Fox
# Use in batch_slugify
slugs = batch_slugify([
"The Quick Brown Fox",
"Jump Over The Lazy Dog"
], word_regex=r"[A-Z][a-z]+")
print(slugs) # ['The-Quick-Brown-Fox', 'Jump-Over-The-Lazy-Dog']
```
### Async Usage
Requires Python 3.7+
```python
import asyncio
from sluggi import async_slugify, async_batch_slugify
async def main():
slug = await async_slugify("Hello, world!")
slugs = await async_batch_slugify(["Hello, world!", "ΠΡΠΈΠ²Π΅Ρ ΠΌΠΈΡ"], parallel=True)
print(slug) # hello-world
print(slugs) # ['hello-world', 'privet-mir']
asyncio.run(main())
```
### Custom Separator
```python
slug = slugify("Hello, world!", separator="_")
print(slug) # hello_world
```
### Stopwords
```python
slug = slugify("The quick brown fox", stopwords=["the", "fox"])
print(slug) # quick-brown
```
### Custom Mapping
```python
slug = slugify("Γ€ ΓΆ ΓΌ", custom_map={"Γ€": "ae", "ΓΆ": "oe", "ΓΌ": "ue"})
print(slug) # ae-oe-ue
```
---
## API Reference
### `slugify`
| Argument | Type | Default | Description |
|---------------|----------------|-----------|-----------------------------------------------------------------------------|
| text | str | β | The input string to slugify. |
| separator | str | "-" | Word separator in the slug. |
| custom_map | dict | None | Custom character mappings. |
| stopwords | Iterable[str] | None | Words to exclude from the slug (case-insensitive if `lowercase=True`). |
| lowercase | bool | True | Convert result to lowercase. |
| word_regex | str | None | Custom regex pattern for word extraction (default: `r'\w+'`). |
| process_emoji | bool | True | If `False`, disables emoji-to-name conversion for max performance. |
**Returns:** `str` (slugified string)
**Example:**
```python
slug = slugify("The quick brown fox", stopwords=["the", "fox"])
print(slug) # quick-brown
```
---
### `batch_slugify`
| Argument | Type | Default | Description |
|---------------|----------------|--------------|---------------------------------------------------------------------|
| texts | Iterable[str] | β | List of strings to slugify. |
| separator | str | "-" | Word separator in the slug. |
| custom_map | dict | None | Custom character mappings. |
| stopwords | Iterable[str] | None | Words to exclude from slugs. |
| lowercase | bool | True | Convert result to lowercase. |
| word_regex | str | None | Custom regex pattern for word extraction (default: `r'\w+'`). |
| parallel | bool | False | Enable parallel processing. |
| workers | int | None | Number of parallel workers. |
| mode | str | "thread" | "thread", "process", or "serial". |
| chunk_size | int | 1000 | Number of items per worker chunk. |
| cache_size | int | 2048 | Size of the internal cache. |
**Returns:** `List[str]` (list of slugified strings)
**Example:**
```python
slugs = batch_slugify(["The quick brown fox", "Jumped over the lazy dog"])
print(slugs) # ['quick-brown', 'jumped-over-the-lazy-dog']
```
### `async_slugify(text, separator="-", custom_map=None)`
- Same as `slugify`, but async.
### `async_batch_slugify(texts, ...)`
- Same as `batch_slugify`, but async.
---
## Advanced Usage & Performance Tips
### Skipping Emoji Handling for Maximum Speed
- By default, sluggi converts emoji to their textual names (e.g., π β smiley-face) for maximum compatibility and searchability.
- **For maximum performance**, you can disable emoji handling entirely if you do not need emoji-to-name conversion. This avoids all emoji detection and replacement logic, providing a measurable speedup for emoji-heavy or large datasets.
- To disable emoji handling:
- **Python API:** Pass `process_emoji=False` to `slugify`, `batch_slugify`, or any pipeline config.
- **CLI:** Add the `--no-process-emoji` flag to your command.
**Example:**
```python
slug = slugify("emoji: ππ€π", process_emoji=False)
print(slug) # emoji
```
```bash
sluggi slug "emoji: ππ€π" --no-process-emoji
# Output: emoji
```
### Batch and Async Performance
- **Parallel Processing:**
- For large batches, use `parallel=True` and tune `workers` and `chunk_size`.
- `mode="process"` enables true CPU parallelism for CPU-bound workloads.
- `mode="thread"` is best for I/O-bound or repeated/cached inputs.
- **Caching:**
- Threaded mode enables slugification result caching for repeated or overlapping inputs.
- Process mode disables cache (each process is isolated).
- **Asyncio:**
- Use `async_batch_slugify` for async web servers or event-driven apps.
- The `parallel` option with async batch uses a semaphore to limit concurrency, avoiding event loop starvation.
- For best throughput, set `workers` to your CPU count or the number of concurrent requests you expect.
#### Example: Tuning Batch Processing
```python
# Large batch, CPU-bound: use process pool
slugs = batch_slugify(my_list, parallel=True, mode="process", workers=8, chunk_size=500)
# Async batch in a web API (FastAPI, Starlette, etc.)
from sluggi import async_batch_slugify
@app.post("/bulk-slugify")
async def bulk_slugify(payload: list[str]):
return await async_batch_slugify(payload, parallel=True, workers=8)
```
### When to Use Serial vs Parallel vs Async
- **Serial:** Small batches, low latency, or single-threaded environments.
- **Parallel (thread/process):** Large batches, heavy CPU work, or when maximizing throughput is critical.
- **Async:** Integrate with modern async web frameworks, handle many concurrent requests, or avoid blocking the event loop.
See the docstrings and API reference for more details on each option.
## Command-Line Interface (CLI)
Install CLI dependencies:
```bash
pip install .[cli]
```
### Quick Start
```bash
sluggi slug "ΞΡιά ΟΞΏΟ
ΞΟΟΞΌΞ΅"
# Output: geia-sou-kosme
sluggi slug "The quick brown fox jumps" --stopwords "the,fox"
# Output: quick-brown-jumps
sluggi slug "The Quick Brown Fox" --word-regex "[A-Z][a-z]+"
# Output: The-Quick-Brown-Fox
sluggi slug "The Quick Brown Fox" --no-lowercase
# Output: The-Quick-Brown-Fox
sluggi batch --input names.txt --output slugs.txt
sluggi batch --input names.txt --word-regex "[A-Z][a-z]+" --no-lowercase
# Custom output formatting in batch mode:
sluggi batch --input names.txt --output-format "{line_num}: {original} -> {slug}"
# Output example:
# 1: Foo Bar -> foo-bar
# 2: Baz Qux -> baz-qux
sluggi batch --input names.txt --output-format "{slug}"
# Output: just the slug, as before
# Display results as a rich table in the console:
sluggi batch --input names.txt --display-output
# Output example (with rich):
# ββββββββββββββ³βββββββββββββββ³βββββββββββ
# β row_number β original β slug β
# β‘βββββββββββββββββββββββββββββββββββββββ©
# β 1 β Foo Bar β foo-bar β
# β 2 β Baz Qux β baz-qux β
# ββββββββββββββ΄βββββββββββββββ΄βββββββββββ
```
**Supported placeholders for --output-format:**
- `{slug}`: The generated slug
- `{original}`: The original input line
- `{line_num}`: The 1-based line number
**Note:** The `--display-output` table uses the [rich](https://github.com/Textualize/rich) Python library. If not installed, a plain text table will be shown instead.
### CLI Options
| Option | Description |
|------------------|------------------------------------------------------------------|
| `--separator` | Separator for words in the slug (default: `-`). |
| `--stopwords` | Comma-separated words to exclude from slug. |
| `--custom-map` | Custom mapping as JSON, e.g. `'{"Γ€": "ae"}'`. |
| `--word-regex` | Custom regex pattern for word extraction (default: `\w+`). |
| `--no-lowercase` | Preserve capitalization in the slug (default: False). |
| `--output-format`| Custom output format for batch mode. Supports `{slug}`, `{original}`, `{line_num}`. Default: just the slug. |
| `--display-output`| Display results as a rich table in the console after batch processing. |
### CLI Help
```bash
sluggi --help
```
### Error Handling Example
```bash
sluggi batch --input missing.txt
# Output:
[bold red]Input file not found: missing.txt[/bold red]
```
## Development & Contributing
- Clone the repo:
```bash
git clone https://github.com/blip-box/sluggi.git
cd sluggi
```
- Create a virtual environment and install dependencies using uv:
```bash
uv venv
uv pip install .[dev,cli]
```
- Run tests and lints:
```bash
pytest
ruff src/sluggi tests
black --check src/sluggi tests
```
- Pre-commit hooks:
```bash
pre-commit install
pre-commit run --all-files
```
- PRs and issues welcome!
## Encoding Notes
- Input and output files must be UTF-8 encoded.
- On Windows, use a UTF-8 capable terminal or set the environment variable `PYTHONUTF8=1` if you encounter encoding issues.
### Help and Examples
- Run `sluggi --help` or any subcommand with `--help` to see detailed usage and examples directly in your terminal.
---
## Performance & Benchmarks
Batch slugification performance was measured using the included benchmark script:
```bash
python scripts/benchmark_batch.py
```
**Results on 20,000 random strings:**
| Mode | Time (s) | Avg ms/item |
|----------|----------|-------------|
| Serial | 0.74 | 0.037 |
| Thread | 0.62β0.72| 0.031β0.036 |
| Process | 1.55β1.73| 0.078β0.086 |
- **Serial** is fast and reliable for most workloads.
- **Thread** mode may be slightly faster for I/O-bound or lightweight CPU tasks (default for --parallel).
- **Process** mode (multiprocessing) enables true CPU parallelism, but has higher overhead and is best for very CPU-bound or expensive slugification tasks.
- Use `--mode process` for multiprocessing, `--mode thread` for threads, or `--mode serial` for no parallelism. Combine with `--workers` to tune performance.
**Script location:** `scripts/benchmark_batch.py`
### Shell Completion
Enable tab-completion for your shell (bash, zsh, fish):
```bash
sluggi completion bash # or zsh, fish
# Follow the printed instructions to enable completion in your shell
```
## License
MIT
---
[Changelog]([GitHub Releases](https://github.com/blip-box/sluggi/releases))
> **Note:** This project is a complete rewrite, inspired by existing slugify libraries, but aims to set a new standard for speed, correctness, and extensibility in Python.
---
### See Also
This project was inspired by the Java library [slugify by akullpp](https://github.com/akullpp/slugify). If you need Java or Gradle support, see their documentation for advanced transliteration and custom replacements.
Example (Java):
```java
final Slugify slg = Slugify.builder()
.customReplacements(Map.of("Foo", "Hello", "bar", "world"))
.customReplacement("Foo", "Hello")
.customReplacement("bar", "world")
.build();
final String result = slg.slugify("Foo, bar!");
// result: hello-world
```
For advanced transliteration in Java:
```groovy
capabilities {
requireCapability('com.github.slugify:slugify-transliterator')
}
```
Or add the optional dependency `com.ibm.icu:icu4j` to your project.
---
β¨ **New Automation & Collaboration Features**
- **Adaptive triage workflows**: Issues and PRs are now auto-labeled, parsed for agent/human status, and incomplete PRs are auto-closed for youβsaving time for everyone.
- **Agent-ready templates**: All issue and PR templates are designed for both humans and autonomous agents, with structured metadata and feedback built in.
- **Playground workflow**: Safely experiment, test, or self-heal code with the new playground automationβperfect for bots and contributors alike.
See [.github/workflows/README.md](.github/workflows/README.md) for more details on these next-generation automations!
Raw data
{
"_id": null,
"home_page": null,
"name": "sluggi",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "slug, slugify, url, text, python, unicode, transliteration, cli",
"author": "atillaguzel",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/de/7f/89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483/sluggi-0.1.2.tar.gz",
"platform": null,
"description": "# sluggi\n\n**sluggi** \u2014 The modern, blazing-fast Python library and CLI for turning any text into clean, URL-safe slugs.\n\n[](LICENSE)\n[](https://pypi.org/project/sluggi/)\n[](https://github.com/blip-box/sluggi/actions/workflows/ci.yml)\n[](https://codecov.io/gh/blip-box/sluggi)\n[](https://pypi.org/project/sluggi/)\n[](https://github.com/blip-box/sluggi/releases)\n\n\n\n> Inspired by slugify, reimagined for speed, Unicode, and robust parallel batch processing.\n\n---\n\n## Table of Contents\n- [Features](#features)\n- [Installation](#installation)\n- [Usage](#usage)\n- [API Reference](#api-reference)\n- [Advanced Usage & Performance Tips](#advanced-usage--performance-tips)\n- [Command-Line Interface (CLI)](#command-line-interface-cli)\n- [Development & Contributing](#development--contributing)\n- [Performance & Benchmarks](#performance--benchmarks)\n- [License](#license)\n- [See Also](#see-also)\n\n---\n\n## Features\n- \ud83d\ude80 **Fast:** Optimized for speed with minimal dependencies.\n- \ud83c\udf0d **Unicode & Emoji:** Handles dozens of scripts, emoji, and edge cases out of the box.\n- \ud83d\udd27 **Customizable:** Define your own character mappings and rules.\n- \ud83e\uddf5 **Parallel Batch:** True multi-core batch slugification (thread/process/serial modes).\n- \u26a1 **Async Support:** Full asyncio-compatible API for modern Python apps.\n- \ud83d\udda5\ufe0f **CLI Tool:** Powerful, colorized CLI for quick slug generation and batch jobs.\n- \ud83d\udd12 **Safe Output:** Always generates URL-safe, predictable slugs.\n- \ud83e\udde9 **Extensible API:** Easy to use and extend.\n- \u2705 **CI & Pre-commit:** Linting, formatting, and tests run automatically.\n\n## Modular Slugification Pipeline\n\nsluggi processes text through a modular pipeline of single-responsibility functions, making the codebase more readable, maintainable, and extensible. Each step in the pipeline performs a distinct transformation, allowing for easy customization and extension.\n\n**Pipeline Steps:**\n\n1. **normalize_unicode(text)**\n Normalize Unicode characters to a canonical form (NFKC).\n2. **decode_html_entities_and_refs(text)**\n Decode HTML entities and character references to their Unicode equivalents.\n3. **convert_emojis(text)**\n Replace emojis with their textual representations.\n4. **transliterate_text(text)**\n Transliterate non-ASCII characters to ASCII (where possible).\n5. **apply_custom_replacements(text, custom_map)**\n Apply user-defined or staged character/string replacements.\n6. **extract_words(text, word_regex)**\n Extract words using a customizable regex pattern.\n7. **filter_stopwords(words, stopwords)**\n Remove unwanted words (e.g., stopwords) from the list.\n8. **join_words(words, separator)**\n Join words using the specified separator.\n9. **to_lowercase(text, lowercase)**\n Convert the result to lowercase if requested.\n10. **strip_separators(text, separator)**\n Remove leading/trailing separators.\n11. **smart_truncate(text, max_length, separator)**\n Optionally truncate the slug at a word boundary.\n\n**Processing Flow:**\n\n Input Text\n \u2193\n normalize_unicode\n \u2193\n decode_html_entities_and_refs\n \u2193\n convert_emojis\n \u2193\n transliterate_text\n \u2193\n apply_custom_replacements\n \u2193\n extract_words\n \u2193\n filter_stopwords\n \u2193\n join_words\n \u2193\n to_lowercase\n \u2193\n strip_separators\n \u2193\n smart_truncate\n \u2193\n Final Slug\n\nThis modular approach makes it easy to add, remove, or modify steps in the pipeline. Each function is pure and well-documented. See the API docs and source for details on customizing or extending the pipeline.\n\n## Installation\nInstall from PyPI:\n```bash\npip install sluggi\n```\n\nFor CLI and development:\n```bash\npip install .[cli,dev]\n```\n\n## Usage\n```python\nfrom sluggi import slugify, batch_slugify\n\nslug = slugify(\"Hello, world!\")\nprint(slug) # hello-world\n\n# Batch processing (parallel by default)\nslugs = batch_slugify([\"Hello, world!\", \"\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440\"])\nprint(slugs) # ['hello-world', 'privet-mir']\n\n# Advanced: Parallel processing\nslugs = batch_slugify([\"foo\", \"bar\"], parallel=True, mode=\"process\", workers=2)\n\n# Stopwords (exclude common words from slugs)\nslug = slugify(\"The quick brown fox jumps\", stopwords=[\"the\", \"fox\"])\nprint(slug) # quick-brown-jumps\n\nslugs = batch_slugify([\n \"The quick brown fox jumps\",\n \"Jump over the lazy dog\"\n], stopwords=[\"the\", \"over\", \"dog\"])\nprint(slugs) # ['quick-brown-fox-jumps', 'jump-lazy']\n\n# Custom regex pattern for word extraction (e.g., only extract capitalized words)\nslug = slugify(\"The Quick Brown Fox\", word_regex=r\"[A-Z][a-z]+\")\nprint(slug) # The-Quick-Brown-Fox\n\n# Use in batch_slugify\nslugs = batch_slugify([\n \"The Quick Brown Fox\",\n \"Jump Over The Lazy Dog\"\n], word_regex=r\"[A-Z][a-z]+\")\nprint(slugs) # ['The-Quick-Brown-Fox', 'Jump-Over-The-Lazy-Dog']\n```\n\n### Async Usage\nRequires Python 3.7+\n```python\nimport asyncio\nfrom sluggi import async_slugify, async_batch_slugify\n\nasync def main():\n slug = await async_slugify(\"Hello, world!\")\n slugs = await async_batch_slugify([\"Hello, world!\", \"\u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440\"], parallel=True)\n print(slug) # hello-world\n print(slugs) # ['hello-world', 'privet-mir']\n\nasyncio.run(main())\n```\n\n### Custom Separator\n```python\nslug = slugify(\"Hello, world!\", separator=\"_\")\nprint(slug) # hello_world\n```\n\n### Stopwords\n```python\nslug = slugify(\"The quick brown fox\", stopwords=[\"the\", \"fox\"])\nprint(slug) # quick-brown\n```\n\n### Custom Mapping\n```python\nslug = slugify(\"\u00e4 \u00f6 \u00fc\", custom_map={\"\u00e4\": \"ae\", \"\u00f6\": \"oe\", \"\u00fc\": \"ue\"})\nprint(slug) # ae-oe-ue\n```\n\n---\n\n## API Reference\n\n### `slugify`\n\n| Argument | Type | Default | Description |\n|---------------|----------------|-----------|-----------------------------------------------------------------------------|\n| text | str | \u2014 | The input string to slugify. |\n| separator | str | \"-\" | Word separator in the slug. |\n| custom_map | dict | None | Custom character mappings. |\n| stopwords | Iterable[str] | None | Words to exclude from the slug (case-insensitive if `lowercase=True`). |\n| lowercase | bool | True | Convert result to lowercase. |\n| word_regex | str | None | Custom regex pattern for word extraction (default: `r'\\w+'`). |\n| process_emoji | bool | True | If `False`, disables emoji-to-name conversion for max performance. |\n\n**Returns:** `str` (slugified string)\n\n**Example:**\n```python\nslug = slugify(\"The quick brown fox\", stopwords=[\"the\", \"fox\"])\nprint(slug) # quick-brown\n```\n\n---\n\n### `batch_slugify`\n\n| Argument | Type | Default | Description |\n|---------------|----------------|--------------|---------------------------------------------------------------------|\n| texts | Iterable[str] | \u2014 | List of strings to slugify. |\n| separator | str | \"-\" | Word separator in the slug. |\n| custom_map | dict | None | Custom character mappings. |\n| stopwords | Iterable[str] | None | Words to exclude from slugs. |\n| lowercase | bool | True | Convert result to lowercase. |\n| word_regex | str | None | Custom regex pattern for word extraction (default: `r'\\w+'`). |\n| parallel | bool | False | Enable parallel processing. |\n| workers | int | None | Number of parallel workers. |\n| mode | str | \"thread\" | \"thread\", \"process\", or \"serial\". |\n| chunk_size | int | 1000 | Number of items per worker chunk. |\n| cache_size | int | 2048 | Size of the internal cache. |\n\n**Returns:** `List[str]` (list of slugified strings)\n\n**Example:**\n```python\nslugs = batch_slugify([\"The quick brown fox\", \"Jumped over the lazy dog\"])\nprint(slugs) # ['quick-brown', 'jumped-over-the-lazy-dog']\n```\n\n### `async_slugify(text, separator=\"-\", custom_map=None)`\n- Same as `slugify`, but async.\n\n### `async_batch_slugify(texts, ...)`\n- Same as `batch_slugify`, but async.\n\n---\n\n## Advanced Usage & Performance Tips\n\n### Skipping Emoji Handling for Maximum Speed\n- By default, sluggi converts emoji to their textual names (e.g., \ud83d\ude0e \u2192 smiley-face) for maximum compatibility and searchability.\n- **For maximum performance**, you can disable emoji handling entirely if you do not need emoji-to-name conversion. This avoids all emoji detection and replacement logic, providing a measurable speedup for emoji-heavy or large datasets.\n- To disable emoji handling:\n - **Python API:** Pass `process_emoji=False` to `slugify`, `batch_slugify`, or any pipeline config.\n - **CLI:** Add the `--no-process-emoji` flag to your command.\n\n**Example:**\n```python\nslug = slugify(\"emoji: \ud83d\ude0e\ud83e\udd16\ud83c\udf89\", process_emoji=False)\nprint(slug) # emoji\n```\n\n```bash\nsluggi slug \"emoji: \ud83d\ude0e\ud83e\udd16\ud83c\udf89\" --no-process-emoji\n# Output: emoji\n```\n\n### Batch and Async Performance\n- **Parallel Processing:**\n - For large batches, use `parallel=True` and tune `workers` and `chunk_size`.\n - `mode=\"process\"` enables true CPU parallelism for CPU-bound workloads.\n - `mode=\"thread\"` is best for I/O-bound or repeated/cached inputs.\n- **Caching:**\n - Threaded mode enables slugification result caching for repeated or overlapping inputs.\n - Process mode disables cache (each process is isolated).\n- **Asyncio:**\n - Use `async_batch_slugify` for async web servers or event-driven apps.\n - The `parallel` option with async batch uses a semaphore to limit concurrency, avoiding event loop starvation.\n - For best throughput, set `workers` to your CPU count or the number of concurrent requests you expect.\n\n#### Example: Tuning Batch Processing\n```python\n# Large batch, CPU-bound: use process pool\nslugs = batch_slugify(my_list, parallel=True, mode=\"process\", workers=8, chunk_size=500)\n\n# Async batch in a web API (FastAPI, Starlette, etc.)\nfrom sluggi import async_batch_slugify\n\n@app.post(\"/bulk-slugify\")\nasync def bulk_slugify(payload: list[str]):\n return await async_batch_slugify(payload, parallel=True, workers=8)\n```\n\n### When to Use Serial vs Parallel vs Async\n- **Serial:** Small batches, low latency, or single-threaded environments.\n- **Parallel (thread/process):** Large batches, heavy CPU work, or when maximizing throughput is critical.\n- **Async:** Integrate with modern async web frameworks, handle many concurrent requests, or avoid blocking the event loop.\n\nSee the docstrings and API reference for more details on each option.\n\n## Command-Line Interface (CLI)\n\nInstall CLI dependencies:\n```bash\npip install .[cli]\n```\n\n### Quick Start\n```bash\nsluggi slug \"\u0393\u03b5\u03b9\u03ac \u03c3\u03bf\u03c5 \u039a\u03cc\u03c3\u03bc\u03b5\"\n# Output: geia-sou-kosme\n\nsluggi slug \"The quick brown fox jumps\" --stopwords \"the,fox\"\n# Output: quick-brown-jumps\n\nsluggi slug \"The Quick Brown Fox\" --word-regex \"[A-Z][a-z]+\"\n# Output: The-Quick-Brown-Fox\n\nsluggi slug \"The Quick Brown Fox\" --no-lowercase\n# Output: The-Quick-Brown-Fox\n\nsluggi batch --input names.txt --output slugs.txt\nsluggi batch --input names.txt --word-regex \"[A-Z][a-z]+\" --no-lowercase\n\n# Custom output formatting in batch mode:\nsluggi batch --input names.txt --output-format \"{line_num}: {original} -> {slug}\"\n# Output example:\n# 1: Foo Bar -> foo-bar\n# 2: Baz Qux -> baz-qux\n\nsluggi batch --input names.txt --output-format \"{slug}\"\n# Output: just the slug, as before\n\n# Display results as a rich table in the console:\nsluggi batch --input names.txt --display-output\n# Output example (with rich):\n# \u250f\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2533\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2513\n# \u2503 row_number \u2503 original \u2503 slug \u2503\n# \u2521\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2547\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2529\n# \u2502 1 \u2502 Foo Bar \u2502 foo-bar \u2502\n# \u2502 2 \u2502 Baz Qux \u2502 baz-qux \u2502\n# \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n```\n\n**Supported placeholders for --output-format:**\n- `{slug}`: The generated slug\n- `{original}`: The original input line\n- `{line_num}`: The 1-based line number\n\n**Note:** The `--display-output` table uses the [rich](https://github.com/Textualize/rich) Python library. If not installed, a plain text table will be shown instead.\n\n### CLI Options\n\n| Option | Description |\n|------------------|------------------------------------------------------------------|\n| `--separator` | Separator for words in the slug (default: `-`). |\n| `--stopwords` | Comma-separated words to exclude from slug. |\n| `--custom-map` | Custom mapping as JSON, e.g. `'{\"\u00e4\": \"ae\"}'`. |\n| `--word-regex` | Custom regex pattern for word extraction (default: `\\w+`). |\n| `--no-lowercase` | Preserve capitalization in the slug (default: False). |\n| `--output-format`| Custom output format for batch mode. Supports `{slug}`, `{original}`, `{line_num}`. Default: just the slug. |\n| `--display-output`| Display results as a rich table in the console after batch processing. |\n\n\n### CLI Help\n```bash\nsluggi --help\n```\n\n### Error Handling Example\n```bash\nsluggi batch --input missing.txt\n# Output:\n[bold red]Input file not found: missing.txt[/bold red]\n```\n\n## Development & Contributing\n\n- Clone the repo:\n ```bash\n git clone https://github.com/blip-box/sluggi.git\n cd sluggi\n ```\n- Create a virtual environment and install dependencies using uv:\n ```bash\n uv venv\n uv pip install .[dev,cli]\n ```\n- Run tests and lints:\n ```bash\n pytest\n ruff src/sluggi tests\n black --check src/sluggi tests\n ```\n- Pre-commit hooks:\n ```bash\n pre-commit install\n pre-commit run --all-files\n ```\n- PRs and issues welcome!\n\n## Encoding Notes\n- Input and output files must be UTF-8 encoded.\n- On Windows, use a UTF-8 capable terminal or set the environment variable `PYTHONUTF8=1` if you encounter encoding issues.\n\n### Help and Examples\n- Run `sluggi --help` or any subcommand with `--help` to see detailed usage and examples directly in your terminal.\n\n---\n\n## Performance & Benchmarks\n\nBatch slugification performance was measured using the included benchmark script:\n\n```bash\npython scripts/benchmark_batch.py\n```\n\n**Results on 20,000 random strings:**\n\n| Mode | Time (s) | Avg ms/item |\n|----------|----------|-------------|\n| Serial | 0.74 | 0.037 |\n| Thread | 0.62\u20130.72| 0.031\u20130.036 |\n| Process | 1.55\u20131.73| 0.078\u20130.086 |\n\n- **Serial** is fast and reliable for most workloads.\n- **Thread** mode may be slightly faster for I/O-bound or lightweight CPU tasks (default for --parallel).\n- **Process** mode (multiprocessing) enables true CPU parallelism, but has higher overhead and is best for very CPU-bound or expensive slugification tasks.\n- Use `--mode process` for multiprocessing, `--mode thread` for threads, or `--mode serial` for no parallelism. Combine with `--workers` to tune performance.\n\n**Script location:** `scripts/benchmark_batch.py`\n\n### Shell Completion\nEnable tab-completion for your shell (bash, zsh, fish):\n```bash\nsluggi completion bash # or zsh, fish\n# Follow the printed instructions to enable completion in your shell\n```\n\n## License\nMIT\n\n---\n\n[Changelog]([GitHub Releases](https://github.com/blip-box/sluggi/releases))\n\n> **Note:** This project is a complete rewrite, inspired by existing slugify libraries, but aims to set a new standard for speed, correctness, and extensibility in Python.\n\n---\n\n### See Also\n\nThis project was inspired by the Java library [slugify by akullpp](https://github.com/akullpp/slugify). If you need Java or Gradle support, see their documentation for advanced transliteration and custom replacements.\n\nExample (Java):\n```java\nfinal Slugify slg = Slugify.builder()\n .customReplacements(Map.of(\"Foo\", \"Hello\", \"bar\", \"world\"))\n .customReplacement(\"Foo\", \"Hello\")\n .customReplacement(\"bar\", \"world\")\n .build();\nfinal String result = slg.slugify(\"Foo, bar!\");\n// result: hello-world\n```\n\nFor advanced transliteration in Java:\n```groovy\ncapabilities {\n requireCapability('com.github.slugify:slugify-transliterator')\n}\n```\nOr add the optional dependency `com.ibm.icu:icu4j` to your project.\n\n---\n\n\u2728 **New Automation & Collaboration Features**\n\n- **Adaptive triage workflows**: Issues and PRs are now auto-labeled, parsed for agent/human status, and incomplete PRs are auto-closed for you\u2014saving time for everyone.\n- **Agent-ready templates**: All issue and PR templates are designed for both humans and autonomous agents, with structured metadata and feedback built in.\n- **Playground workflow**: Safely experiment, test, or self-heal code with the new playground automation\u2014perfect for bots and contributors alike.\n\nSee [.github/workflows/README.md](.github/workflows/README.md) for more details on these next-generation automations!\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "A modern, high-performance Python library for turning text into clean, URL-safe slugs.",
"version": "0.1.2",
"project_urls": {
"Changelog": "https://github.com/blip-box/sluggi/releases",
"Documentation": "https://github.com/blip-box/sluggi#readme",
"Homepage": "https://github.com/blip-box/sluggi",
"Issues": "https://github.com/blip-box/sluggi/issues",
"Repository": "https://github.com/blip-box/sluggi"
},
"split_keywords": [
"slug",
" slugify",
" url",
" text",
" python",
" unicode",
" transliteration",
" cli"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "8caf6ead89a4f9c2b1c1f14e813f6a00b9251fb443b17590ea031ae9f15f4143",
"md5": "824305003c0f81b35190b73b23951aa6",
"sha256": "b2d493194f57c488e72c57f149806c64b2458ec441005f826fd7cb10326b6417"
},
"downloads": -1,
"filename": "sluggi-0.1.2-py3-none-any.whl",
"has_sig": false,
"md5_digest": "824305003c0f81b35190b73b23951aa6",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 28330,
"upload_time": "2025-10-13T20:40:56",
"upload_time_iso_8601": "2025-10-13T20:40:56.681496Z",
"url": "https://files.pythonhosted.org/packages/8c/af/6ead89a4f9c2b1c1f14e813f6a00b9251fb443b17590ea031ae9f15f4143/sluggi-0.1.2-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "de7f89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483",
"md5": "f12919b76da92ed08681b552024715ef",
"sha256": "5ecaacca10697635782312b2be742deed3652265a303dfc2f67709b7d37fb539"
},
"downloads": -1,
"filename": "sluggi-0.1.2.tar.gz",
"has_sig": false,
"md5_digest": "f12919b76da92ed08681b552024715ef",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 35670,
"upload_time": "2025-10-13T20:40:57",
"upload_time_iso_8601": "2025-10-13T20:40:57.896427Z",
"url": "https://files.pythonhosted.org/packages/de/7f/89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483/sluggi-0.1.2.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-13 20:40:57",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "blip-box",
"github_project": "sluggi",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"lcname": "sluggi"
}