safetext

Name	safetext JSON
Version	0.3.2 JSON
	download
home_page	None
Summary	Fast profanity filtering tool for multiple languages
upload_time	2025-08-06 20:33:16
maintainer	None
docs_url	None
author	Viddexa AI
requires_python	>=3.10
license	None
keywords	text profanity filtering moderation turkish english spanish arabic hindi chinese portuguese russian french german japanese persian azerbaijani
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <div align="center">
  <p>
    <a align="center" href="" target="_blank">
      <img
        width="1280"
        src="https://github.com/viddexa/safetext/assets/44926076/9af66dde-3a93-4c5b-b802-cb31dffcb2e5"
      >
    </a>
  </p>

[![Context7 MCP](https://img.shields.io/badge/Context7%20MCP-Indexed-blue)](https://context7.com/viddexa/safetext)
[![llms.txt](https://img.shields.io/badge/llms.txt-✓-brightgreen)](https://context7.com/viddexa/safetext/llms.txt)
[![version](https://badge.fury.io/py/safetext.svg)](https://badge.fury.io/py/safetext)
[![downloads](https://pepy.tech/badge/safetext)](https://pepy.tech/project/safetext)
[![license](https://img.shields.io/pypi/l/safetext)](LICENSE)

</div>

## 🤔 why safetext?

**Fast profanity detection and filtering for 13 languages.**

- **Multi-format Detection**: Single words, phrases, and contextual profanity
- **Custom Word Lists**: Extend built-in lists with your own profanity words
- **Whitelisting**: Exclude specific words from detection
- **Auto Language Detection**: From text or subtitle files
- **Precise Filtering**: Exact position tracking and custom censoring
- **Simple Integration**: One-line setup with clean API

## 📦 installation

easily install **safetext** with pip:

```bash
pip install safetext
```

for development setup, see our [scripts documentation](scripts/README.md).

## 🎯 quickstart

### check and censor profanity

```python
>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."
```

### extending profanity lists with custom words

Add your own profanity words by providing a custom words directory:

```python
# Directory structure:
# custom_profanity_words/
# ├── en.txt              # English custom words
# ├── tr.txt              # Turkish custom words
# └── es.txt              # Spanish custom words

>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')

>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]
```

Custom word files should contain one word/phrase per line:

```
# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
```

### using whitelist

exclude specific words from profanity detection:

```python
# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

# Combining custom words with whitelist
>>> st = SafeText(
...     language='en', 
...     custom_words_dir='custom_profanity_words',
...     whitelist=['allowedcustomword']
... )
```

### automated language detection

- from text:

```python
>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'
```

- from .srt (subtitle) file:

```python
>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'
```

## 🌍 supported languages

**safetext** currently supports profanity detection in 13 languages:

| Language | ISO 639-1 Code | Language Name |
|----------|----------------|---------------|
| 🇸🇦 | `ar` | Arabic |
| 🇦🇿 | `az` | Azerbaijani |
| 🇩🇪 | `de` | German |
| 🇬🇧 | `en` | English |
| 🇪🇸 | `es` | Spanish |
| 🇮🇷 | `fa` | Persian (Farsi) |
| 🇫🇷 | `fr` | French |
| 🇮🇳 | `hi` | Hindi |
| 🇯🇵 | `ja` | Japanese |
| 🇵🇹 | `pt` | Portuguese |
| 🇷🇺 | `ru` | Russian |
| 🇹🇷 | `tr` | Turkish |
| 🇨🇳 | `zh` | Chinese |

## 🤝 contribute to safetext

join our mission in refining content moderation!

contribute by:

- **adding new languages**: create a folder with the ISO 639-1 code and include a `words.txt`.
- **enhancing word lists**: improve detection accuracy.
- **sharing feedback**: your ideas can shape `safetext`.

see our [contributing guidelines](CONTRIBUTING.md) for development workflow, [test documentation](tests/README.md) for running tests, and [scripts guide](scripts/README.md) for automation tools.

______________________________________________________________________

## 🏆 contributors

meet our awesome contributors who make **safetext** better every day!

<p align="center">
    <a href="https://github.com/viddexa/safetext/graphs/contributors">
      <img src="https://contrib.rocks/image?repo=viddexa/safetext" />
    </a>
</p>

______________________________________________________________________

<div align="center">
  <b>follow us for more!</b>
  <br><br>
  <a href="https://www.linkedin.com/company/viddexa/">LinkedIn</a> • 
  <a href="https://huggingface.co/viddexa">Hugging Face</a> • 
  <a href="https://x.com/viddexa">X</a>
</div>

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "safetext",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": null,
    "keywords": "text, profanity, filtering, moderation, turkish, english, spanish, arabic, hindi, chinese, portuguese, russian, french, german, japanese, persian, azerbaijani",
    "author": "Viddexa AI",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/ad/bf/4980f41c1d7545a4a7e85c31721e234a35fe2451653a3f8f797ab53b8d6b/safetext-0.3.2.tar.gz",
    "platform": null,
    "description": "<div align=\"center\">\n  <p>\n    <a align=\"center\" href=\"\" target=\"_blank\">\n      <img\n        width=\"1280\"\n        src=\"https://github.com/viddexa/safetext/assets/44926076/9af66dde-3a93-4c5b-b802-cb31dffcb2e5\"\n      >\n    </a>\n  </p>\n\n[![Context7 MCP](https://img.shields.io/badge/Context7%20MCP-Indexed-blue)](https://context7.com/viddexa/safetext)\n[![llms.txt](https://img.shields.io/badge/llms.txt-\u2713-brightgreen)](https://context7.com/viddexa/safetext/llms.txt)\n[![version](https://badge.fury.io/py/safetext.svg)](https://badge.fury.io/py/safetext)\n[![downloads](https://pepy.tech/badge/safetext)](https://pepy.tech/project/safetext)\n[![license](https://img.shields.io/pypi/l/safetext)](LICENSE)\n\n</div>\n\n## \ud83e\udd14 why safetext?\n\n**Fast profanity detection and filtering for 13 languages.**\n\n- **Multi-format Detection**: Single words, phrases, and contextual profanity\n- **Custom Word Lists**: Extend built-in lists with your own profanity words\n- **Whitelisting**: Exclude specific words from detection\n- **Auto Language Detection**: From text or subtitle files\n- **Precise Filtering**: Exact position tracking and custom censoring\n- **Simple Integration**: One-line setup with clean API\n\n## \ud83d\udce6 installation\n\neasily install **safetext** with pip:\n\n```bash\npip install safetext\n```\n\nfor development setup, see our [scripts documentation](scripts/README.md).\n\n## \ud83c\udfaf quickstart\n\n### check and censor profanity\n\n```python\n>>> from safetext import SafeText\n\n>>> st = SafeText(language='en')\n\n>>> results = st.check_profanity(text='Some text with <profanity-word>.')\n>>> results\n[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]\n\n>>> text = st.censor_profanity(text='Some text with <profanity-word>.')\n>>> text\n\"Some text with ***.\"\n```\n\n### extending profanity lists with custom words\n\nAdd your own profanity words by providing a custom words directory:\n\n```python\n# Directory structure:\n# custom_profanity_words/\n# \u251c\u2500\u2500 en.txt              # English custom words\n# \u251c\u2500\u2500 tr.txt              # Turkish custom words\n# \u2514\u2500\u2500 es.txt              # Spanish custom words\n\n>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')\n\n>>> # Custom words from en.txt are now included\n>>> results = st.check_profanity('This mycustomword is inappropriate')\n>>> results\n[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]\n```\n\nCustom word files should contain one word/phrase per line:\n\n```\n# custom_profanity_words/en.txt\nmycustomword\ninappropriate phrase\ncompany specific term\n```\n\n### using whitelist\n\nexclude specific words from profanity detection:\n\n```python\n# Using a list of words\n>>> st = SafeText(language='en', whitelist=['word1', 'word2'])\n\n# Using a file (one word per line)\n>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')\n\n# Combining custom words with whitelist\n>>> st = SafeText(\n...     language='en', \n...     custom_words_dir='custom_profanity_words',\n...     whitelist=['allowedcustomword']\n... )\n```\n\n### automated language detection\n\n- from text:\n\n```python\n>>> from safetext import SafeText\n\n>>> eng_text = \"This story is about to take a dark turn.\"\n\n>>> st = SafeText(language=None)\n>>> st.set_language_from_text(eng_text)\n\n>>> st.language\n'en'\n```\n\n- from .srt (subtitle) file:\n\n```python\n>>> from safetext import SafeText\n\n>>> turkish_srt_file_path = \"turkish.srt\"\n\n>>> st = SafeText(language=None)\n>>> st.set_language_from_srt(turkish_srt_file_path)\n\n>>> st.language\n'tr'\n```\n\n## \ud83c\udf0d supported languages\n\n**safetext** currently supports profanity detection in 13 languages:\n\n| Language | ISO 639-1 Code | Language Name |\n|----------|----------------|---------------|\n| \ud83c\uddf8\ud83c\udde6 | `ar` | Arabic |\n| \ud83c\udde6\ud83c\uddff | `az` | Azerbaijani |\n| \ud83c\udde9\ud83c\uddea | `de` | German |\n| \ud83c\uddec\ud83c\udde7 | `en` | English |\n| \ud83c\uddea\ud83c\uddf8 | `es` | Spanish |\n| \ud83c\uddee\ud83c\uddf7 | `fa` | Persian (Farsi) |\n| \ud83c\uddeb\ud83c\uddf7 | `fr` | French |\n| \ud83c\uddee\ud83c\uddf3 | `hi` | Hindi |\n| \ud83c\uddef\ud83c\uddf5 | `ja` | Japanese |\n| \ud83c\uddf5\ud83c\uddf9 | `pt` | Portuguese |\n| \ud83c\uddf7\ud83c\uddfa | `ru` | Russian |\n| \ud83c\uddf9\ud83c\uddf7 | `tr` | Turkish |\n| \ud83c\udde8\ud83c\uddf3 | `zh` | Chinese |\n\n## \ud83e\udd1d contribute to safetext\n\njoin our mission in refining content moderation!\n\ncontribute by:\n\n- **adding new languages**: create a folder with the ISO 639-1 code and include a `words.txt`.\n- **enhancing word lists**: improve detection accuracy.\n- **sharing feedback**: your ideas can shape `safetext`.\n\nsee our [contributing guidelines](CONTRIBUTING.md) for development workflow, [test documentation](tests/README.md) for running tests, and [scripts guide](scripts/README.md) for automation tools.\n\n______________________________________________________________________\n\n## \ud83c\udfc6 contributors\n\nmeet our awesome contributors who make **safetext** better every day!\n\n<p align=\"center\">\n    <a href=\"https://github.com/viddexa/safetext/graphs/contributors\">\n      <img src=\"https://contrib.rocks/image?repo=viddexa/safetext\" />\n    </a>\n</p>\n\n______________________________________________________________________\n\n<div align=\"center\">\n  <b>follow us for more!</b>\n  <br><br>\n  <a href=\"https://www.linkedin.com/company/viddexa/\">LinkedIn</a> \u2022 \n  <a href=\"https://huggingface.co/viddexa\">Hugging Face</a> \u2022 \n  <a href=\"https://x.com/viddexa\">X</a>\n</div>\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Fast profanity filtering tool for multiple languages",
    "version": "0.3.2",
    "project_urls": {
        "Changelog": "https://github.com/viddexa/safetext/releases",
        "Homepage": "https://github.com/viddexa/safetext",
        "Issues": "https://github.com/viddexa/safetext/discussions/categories/q-a",
        "Source": "https://github.com/viddexa/safetext"
    },
    "split_keywords": [
        "text",
        " profanity",
        " filtering",
        " moderation",
        " turkish",
        " english",
        " spanish",
        " arabic",
        " hindi",
        " chinese",
        " portuguese",
        " russian",
        " french",
        " german",
        " japanese",
        " persian",
        " azerbaijani"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "cecd0bb7bfe8b105ecbd672b09804f99eabb47b657ddfa2a34fcd2db6645a2b9",
                "md5": "7dc9e2941aef2d5b9aeff0a15639b6eb",
                "sha256": "5e22d1f151c295d63684ced6756b047c74d58855e6c1ec9b37515ead8f76b160"
            },
            "downloads": -1,
            "filename": "safetext-0.3.2-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "7dc9e2941aef2d5b9aeff0a15639b6eb",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 82518,
            "upload_time": "2025-08-06T20:33:14",
            "upload_time_iso_8601": "2025-08-06T20:33:14.891969Z",
            "url": "https://files.pythonhosted.org/packages/ce/cd/0bb7bfe8b105ecbd672b09804f99eabb47b657ddfa2a34fcd2db6645a2b9/safetext-0.3.2-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "adbf4980f41c1d7545a4a7e85c31721e234a35fe2451653a3f8f797ab53b8d6b",
                "md5": "8848970c6190ba97413d4ac03b71e45d",
                "sha256": "8213fe8591a4d6409f5e7904f571315aea5690e8f462666101b3a313f0f2089c"
            },
            "downloads": -1,
            "filename": "safetext-0.3.2.tar.gz",
            "has_sig": false,
            "md5_digest": "8848970c6190ba97413d4ac03b71e45d",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 91546,
            "upload_time": "2025-08-06T20:33:16",
            "upload_time_iso_8601": "2025-08-06T20:33:16.222537Z",
            "url": "https://files.pythonhosted.org/packages/ad/bf/4980f41c1d7545a4a7e85c31721e234a35fe2451653a3f8f797ab53b8d6b/safetext-0.3.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-08-06 20:33:16",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "viddexa",
    "github_project": "safetext",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "safetext"
}

Viddexa AI