# Durak
<p align="center">
<img src="docs/durak.svg" alt="Durak logo" width="200" />
</p>
Durak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.
- Personal homepage: [karagoz.io](https://karagoz.io)
- Source repository: [github.com/fbkaragoz/durak](https://github.com/fbkaragoz/durak)
## Getting Started
Durak is under active development. The first public release will provide:
- Unicode-aware cleaning functions tuned for Turkish data sources.
- Tokenisation strategies ranging from regex to pluggable subword engines.
- Stopword curation helpers with domain-specific override support.
- Pluggable lemmatisation interface with adapters for Zemberek, spaCy, and Stanza.
- Frequency statistics utilities for exploratory corpus analysis.
## Contributing
1. Create a virtual environment (`conda activate nlp.env` or `python -m venv .venv`).
2. Install development dependencies: `pip install -e .[dev]`.
3. Run the test suite: `pytest`.
Roadmap and task planning live in `ROADMAP.md`. Update the roadmap as you make progress if you want to contribute.
Raw data
{
"_id": null,
"home_page": null,
"name": "durak-nlp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "turkish nlp, text processing, preprocessing, lemmatization, tokenization",
"author": "Fatih Burak Karag\u00f6z",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/9b/58/bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107/durak_nlp-0.1.0.tar.gz",
"platform": null,
"description": "# Durak\n\n<p align=\"center\">\n <img src=\"docs/durak.svg\" alt=\"Durak logo\" width=\"200\" />\n</p>\n\nDurak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.\n\n- Personal homepage: [karagoz.io](https://karagoz.io)\n- Source repository: [github.com/fbkaragoz/durak](https://github.com/fbkaragoz/durak)\n\n## Getting Started\n\nDurak is under active development. The first public release will provide:\n\n- Unicode-aware cleaning functions tuned for Turkish data sources.\n- Tokenisation strategies ranging from regex to pluggable subword engines.\n- Stopword curation helpers with domain-specific override support.\n- Pluggable lemmatisation interface with adapters for Zemberek, spaCy, and Stanza.\n- Frequency statistics utilities for exploratory corpus analysis.\n\n## Contributing\n\n1. Create a virtual environment (`conda activate nlp.env` or `python -m venv .venv`).\n2. Install development dependencies: `pip install -e .[dev]`.\n3. Run the test suite: `pytest`.\n\nRoadmap and task planning live in `ROADMAP.md`. Update the roadmap as you make progress if you want to contribute.\n",
"bugtrack_url": null,
"license": null,
"summary": "Durak: modular Turkish NLP preprocessing toolkit.",
"version": "0.1.0",
"project_urls": {
"Homepage": "https://karagoz.io",
"Issues": "https://github.com/fbkaragoz/durak/issues",
"Repository": "https://github.com/fbkaragoz/durak"
},
"split_keywords": [
"turkish nlp",
" text processing",
" preprocessing",
" lemmatization",
" tokenization"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "62adcaa0dd8cf0e8874eaf3b60b480487ebf904cc09a129b1d29801667d719ea",
"md5": "99094785d8670c7805652824c2d4c372",
"sha256": "51855a54468e5f9cf5bc673cd6880f2c409df30d2079316bd105f40c3c7e95e3"
},
"downloads": -1,
"filename": "durak_nlp-0.1.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "99094785d8670c7805652824c2d4c372",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 9223,
"upload_time": "2025-10-19T14:11:52",
"upload_time_iso_8601": "2025-10-19T14:11:52.709695Z",
"url": "https://files.pythonhosted.org/packages/62/ad/caa0dd8cf0e8874eaf3b60b480487ebf904cc09a129b1d29801667d719ea/durak_nlp-0.1.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "9b58bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107",
"md5": "66f2a74abdc624ec028c16a8d5c5bc44",
"sha256": "25e7245b43400f794ecaf7e875cb178990819faae7b95589dbbc966891b5d6e9"
},
"downloads": -1,
"filename": "durak_nlp-0.1.0.tar.gz",
"has_sig": false,
"md5_digest": "66f2a74abdc624ec028c16a8d5c5bc44",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 11776,
"upload_time": "2025-10-19T14:11:54",
"upload_time_iso_8601": "2025-10-19T14:11:54.723949Z",
"url": "https://files.pythonhosted.org/packages/9b/58/bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107/durak_nlp-0.1.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-10-19 14:11:54",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "fbkaragoz",
"github_project": "durak",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "durak-nlp"
}