simple-word-filter


Namesimple-word-filter JSON
Version 1.0.3 PyPI version JSON
download
home_pageNone
SummaryA simple word filtering library for Python.
upload_time2025-10-26 08:41:08
maintainerNone
docs_urlNone
authorNone
requires_python>=3.10
licenseMIT
keywords filter word text censor profanity
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            
# simple-word-filter

`simple-word-filter` 是一个轻量、可扩展的 Python 敏感词过滤库,内置多种匹配算法,可快速集成到文本审核或内容过滤场景中。

## 主要特性

- **多种匹配模式**:内置 `simple`、`regex`、`trie` 三种匹配器,按需取舍准确性与性能。
- **可扩展架构**:通过装饰器即可注册自定义匹配器,满足特殊匹配策略。
- **统一 API**:`contains`、`match_all`、`match_first`、`replace` 等方法在各匹配器间保持一致。
- **性能自测**:提供 `WordFilter.matcher_speed_test`,快速评估不同匹配器的运行效率。
- **现代 Python**:基于 3.10+ 类型注解,易读、易维护。

## 环境要求

- Python 3.10 及以上

## 安装

```bash
pip install simple-word-filter
```

## 快速上手

```python
from simple_word_filter import WordFilter

blocked = ["敏感词", "违禁品", "badword"]
wf = WordFilter(blocked, mode="trie")

text = "这是一段包含敏感词的文本"

wf.contains(text)
# True

wf.match_all(text)
# [('敏感词', 4)]

wf.replace(text, repl_char="*")
# '这是一段包含***的文本'
```

### 选择匹配模式

| 模式     | 适用场景 | 特点 |
|----------|----------|------|
| `simple` | 词库较小、实现最简洁 | 顺序扫描文本,易理解,性能中等 |
| `regex`  | 需要正则表达式能力 | 支持复杂模式匹配,灵活但构造成本较高 |
| `trie`   | 词库较大、追求性能 | 基于 Trie 树,查询效率高 |

可调用 `BaseMatcher.available_matchers()` 查看当前可用模式。

```python
from simple_word_filter import BaseMatcher

print(BaseMatcher.available_matchers())
# ['simple', 'regex', 'trie']
```

### 自定义匹配器

```python
from simple_word_filter import BaseMatcher

@BaseMatcher.matcher("suffix")
class SuffixMatcher(BaseMatcher):
	def match_all(self, text: str):
		matches = []
		for word in self._word_list:
			if text.endswith(word):
				matches.append((word, len(text) - len(word)))
		return matches

	def match_first(self, text: str):
		return self.match_all(text)[0] if self.match_all(text) else None

# 注册后即可像内置模式一样使用
```

### 性能快速评估

```python
from simple_word_filter import WordFilter

best_filter = WordFilter.matcher_speed_test(
	word_list=["foo", "bar", "baz"],
	sample_words=["foo", "bar", "baz", "qux"],
)

print(best_filter.mode)
# 依据测试结果输出运行最快的模式
```

## 开发者指南

```bash
git clone https://github.com/Sparrived/simple-word-filter.git
cd simple-word-filter
uv sync --dev  # 或使用 pip 安装开发依赖
```

运行测试:

```bash
pytest
```

## 发布流程

仓库已配置 GitHub Actions。向 `master` 推送包含 `src/simple_word_filter/__init__.py` 中 `__version__` 变更的提交后,将自动:

1. 构建发布包并上传到 GitHub Release(标签 `v<version>`)。
2. 将同一制品上传到 PyPI。

也可在 GitHub 上手动触发 `Upload Python Package` workflow。

## 许可证

MIT License © [Sparrived](https://github.com/Sparrived)


            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "simple-word-filter",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.10",
    "maintainer_email": "Sparrived <sparrived@outlook.com>",
    "keywords": "filter, word, text, censor, profanity",
    "author": null,
    "author_email": "Sparrived <sparrived@outlook.com>",
    "download_url": "https://files.pythonhosted.org/packages/8c/13/be9bb5ab806ad85a923317060d49ea803408356c71329da2cd62b8ddace2/simple_word_filter-1.0.3.tar.gz",
    "platform": null,
    "description": "\n# simple-word-filter\n\n`simple-word-filter` \u662f\u4e00\u4e2a\u8f7b\u91cf\u3001\u53ef\u6269\u5c55\u7684 Python \u654f\u611f\u8bcd\u8fc7\u6ee4\u5e93\uff0c\u5185\u7f6e\u591a\u79cd\u5339\u914d\u7b97\u6cd5\uff0c\u53ef\u5feb\u901f\u96c6\u6210\u5230\u6587\u672c\u5ba1\u6838\u6216\u5185\u5bb9\u8fc7\u6ee4\u573a\u666f\u4e2d\u3002\n\n## \u4e3b\u8981\u7279\u6027\n\n- **\u591a\u79cd\u5339\u914d\u6a21\u5f0f**\uff1a\u5185\u7f6e `simple`\u3001`regex`\u3001`trie` \u4e09\u79cd\u5339\u914d\u5668\uff0c\u6309\u9700\u53d6\u820d\u51c6\u786e\u6027\u4e0e\u6027\u80fd\u3002\n- **\u53ef\u6269\u5c55\u67b6\u6784**\uff1a\u901a\u8fc7\u88c5\u9970\u5668\u5373\u53ef\u6ce8\u518c\u81ea\u5b9a\u4e49\u5339\u914d\u5668\uff0c\u6ee1\u8db3\u7279\u6b8a\u5339\u914d\u7b56\u7565\u3002\n- **\u7edf\u4e00 API**\uff1a`contains`\u3001`match_all`\u3001`match_first`\u3001`replace` \u7b49\u65b9\u6cd5\u5728\u5404\u5339\u914d\u5668\u95f4\u4fdd\u6301\u4e00\u81f4\u3002\n- **\u6027\u80fd\u81ea\u6d4b**\uff1a\u63d0\u4f9b `WordFilter.matcher_speed_test`\uff0c\u5feb\u901f\u8bc4\u4f30\u4e0d\u540c\u5339\u914d\u5668\u7684\u8fd0\u884c\u6548\u7387\u3002\n- **\u73b0\u4ee3 Python**\uff1a\u57fa\u4e8e 3.10+ \u7c7b\u578b\u6ce8\u89e3\uff0c\u6613\u8bfb\u3001\u6613\u7ef4\u62a4\u3002\n\n## \u73af\u5883\u8981\u6c42\n\n- Python 3.10 \u53ca\u4ee5\u4e0a\n\n## \u5b89\u88c5\n\n```bash\npip install simple-word-filter\n```\n\n## \u5feb\u901f\u4e0a\u624b\n\n```python\nfrom simple_word_filter import WordFilter\n\nblocked = [\"\u654f\u611f\u8bcd\", \"\u8fdd\u7981\u54c1\", \"badword\"]\nwf = WordFilter(blocked, mode=\"trie\")\n\ntext = \"\u8fd9\u662f\u4e00\u6bb5\u5305\u542b\u654f\u611f\u8bcd\u7684\u6587\u672c\"\n\nwf.contains(text)\n# True\n\nwf.match_all(text)\n# [('\u654f\u611f\u8bcd', 4)]\n\nwf.replace(text, repl_char=\"*\")\n# '\u8fd9\u662f\u4e00\u6bb5\u5305\u542b***\u7684\u6587\u672c'\n```\n\n### \u9009\u62e9\u5339\u914d\u6a21\u5f0f\n\n| \u6a21\u5f0f     | \u9002\u7528\u573a\u666f | \u7279\u70b9 |\n|----------|----------|------|\n| `simple` | \u8bcd\u5e93\u8f83\u5c0f\u3001\u5b9e\u73b0\u6700\u7b80\u6d01 | \u987a\u5e8f\u626b\u63cf\u6587\u672c\uff0c\u6613\u7406\u89e3\uff0c\u6027\u80fd\u4e2d\u7b49 |\n| `regex`  | \u9700\u8981\u6b63\u5219\u8868\u8fbe\u5f0f\u80fd\u529b | \u652f\u6301\u590d\u6742\u6a21\u5f0f\u5339\u914d\uff0c\u7075\u6d3b\u4f46\u6784\u9020\u6210\u672c\u8f83\u9ad8 |\n| `trie`   | \u8bcd\u5e93\u8f83\u5927\u3001\u8ffd\u6c42\u6027\u80fd | \u57fa\u4e8e Trie \u6811\uff0c\u67e5\u8be2\u6548\u7387\u9ad8 |\n\n\u53ef\u8c03\u7528 `BaseMatcher.available_matchers()` \u67e5\u770b\u5f53\u524d\u53ef\u7528\u6a21\u5f0f\u3002\n\n```python\nfrom simple_word_filter import BaseMatcher\n\nprint(BaseMatcher.available_matchers())\n# ['simple', 'regex', 'trie']\n```\n\n### \u81ea\u5b9a\u4e49\u5339\u914d\u5668\n\n```python\nfrom simple_word_filter import BaseMatcher\n\n@BaseMatcher.matcher(\"suffix\")\nclass SuffixMatcher(BaseMatcher):\n\tdef match_all(self, text: str):\n\t\tmatches = []\n\t\tfor word in self._word_list:\n\t\t\tif text.endswith(word):\n\t\t\t\tmatches.append((word, len(text) - len(word)))\n\t\treturn matches\n\n\tdef match_first(self, text: str):\n\t\treturn self.match_all(text)[0] if self.match_all(text) else None\n\n# \u6ce8\u518c\u540e\u5373\u53ef\u50cf\u5185\u7f6e\u6a21\u5f0f\u4e00\u6837\u4f7f\u7528\n```\n\n### \u6027\u80fd\u5feb\u901f\u8bc4\u4f30\n\n```python\nfrom simple_word_filter import WordFilter\n\nbest_filter = WordFilter.matcher_speed_test(\n\tword_list=[\"foo\", \"bar\", \"baz\"],\n\tsample_words=[\"foo\", \"bar\", \"baz\", \"qux\"],\n)\n\nprint(best_filter.mode)\n# \u4f9d\u636e\u6d4b\u8bd5\u7ed3\u679c\u8f93\u51fa\u8fd0\u884c\u6700\u5feb\u7684\u6a21\u5f0f\n```\n\n## \u5f00\u53d1\u8005\u6307\u5357\n\n```bash\ngit clone https://github.com/Sparrived/simple-word-filter.git\ncd simple-word-filter\nuv sync --dev  # \u6216\u4f7f\u7528 pip \u5b89\u88c5\u5f00\u53d1\u4f9d\u8d56\n```\n\n\u8fd0\u884c\u6d4b\u8bd5\uff1a\n\n```bash\npytest\n```\n\n## \u53d1\u5e03\u6d41\u7a0b\n\n\u4ed3\u5e93\u5df2\u914d\u7f6e GitHub Actions\u3002\u5411 `master` \u63a8\u9001\u5305\u542b `src/simple_word_filter/__init__.py` \u4e2d `__version__` \u53d8\u66f4\u7684\u63d0\u4ea4\u540e\uff0c\u5c06\u81ea\u52a8\uff1a\n\n1. \u6784\u5efa\u53d1\u5e03\u5305\u5e76\u4e0a\u4f20\u5230 GitHub Release\uff08\u6807\u7b7e `v<version>`\uff09\u3002\n2. \u5c06\u540c\u4e00\u5236\u54c1\u4e0a\u4f20\u5230 PyPI\u3002\n\n\u4e5f\u53ef\u5728 GitHub \u4e0a\u624b\u52a8\u89e6\u53d1 `Upload Python Package` workflow\u3002\n\n## \u8bb8\u53ef\u8bc1\n\nMIT License \u00a9 [Sparrived](https://github.com/Sparrived)\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A simple word filtering library for Python.",
    "version": "1.0.3",
    "project_urls": {
        "Changelog": "https://github.com/Sparrived/simple-word-filter/blob/master/CHANGELOG.md",
        "Homepage": "https://github.com/Sparrived/simple-word-filter",
        "Issues": "https://github.com/Sparrived/simple-word-filter/issues",
        "Repository": "https://github.com/Sparrived/simple-word-filter"
    },
    "split_keywords": [
        "filter",
        " word",
        " text",
        " censor",
        " profanity"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "7d94e49b6969a3f7973392e4ed3f3098becf3eeefb1364260a5e70bded2634e9",
                "md5": "cdb72121b8a0f1a7e36db4bbf4becd17",
                "sha256": "2e07a90c934792d4a96dcf1453f2098024fba1433756b64270c5c6445a345733"
            },
            "downloads": -1,
            "filename": "simple_word_filter-1.0.3-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "cdb72121b8a0f1a7e36db4bbf4becd17",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.10",
            "size": 11491,
            "upload_time": "2025-10-26T08:41:06",
            "upload_time_iso_8601": "2025-10-26T08:41:06.917876Z",
            "url": "https://files.pythonhosted.org/packages/7d/94/e49b6969a3f7973392e4ed3f3098becf3eeefb1364260a5e70bded2634e9/simple_word_filter-1.0.3-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "8c13be9bb5ab806ad85a923317060d49ea803408356c71329da2cd62b8ddace2",
                "md5": "81cbfba9308e402268c5513af168b0fa",
                "sha256": "288bef09720ec8992961eca54718f55a94f09110f639f385769c38eb750e2859"
            },
            "downloads": -1,
            "filename": "simple_word_filter-1.0.3.tar.gz",
            "has_sig": false,
            "md5_digest": "81cbfba9308e402268c5513af168b0fa",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.10",
            "size": 10517,
            "upload_time": "2025-10-26T08:41:08",
            "upload_time_iso_8601": "2025-10-26T08:41:08.281766Z",
            "url": "https://files.pythonhosted.org/packages/8c/13/be9bb5ab806ad85a923317060d49ea803408356c71329da2cd62b8ddace2/simple_word_filter-1.0.3.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-26 08:41:08",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Sparrived",
    "github_project": "simple-word-filter",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "lcname": "simple-word-filter"
}
        
Elapsed time: 4.58136s