durak-nlp

Name	durak-nlp JSON
Version	0.1.0 JSON
	download
home_page	None
Summary	Durak: modular Turkish NLP preprocessing toolkit.
upload_time	2025-10-19 14:11:54
maintainer	None
docs_url	None
author	Fatih Burak Karagöz
requires_python	>=3.9
license	None
keywords	turkish nlp text processing preprocessing lemmatization tokenization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # Durak

<p align="center">
  <img src="docs/durak.svg" alt="Durak logo" width="200" />
</p>

Durak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.

- Personal homepage: [karagoz.io](https://karagoz.io)
- Source repository: [github.com/fbkaragoz/durak](https://github.com/fbkaragoz/durak)

## Getting Started

Durak is under active development. The first public release will provide:

- Unicode-aware cleaning functions tuned for Turkish data sources.
- Tokenisation strategies ranging from regex to pluggable subword engines.
- Stopword curation helpers with domain-specific override support.
- Pluggable lemmatisation interface with adapters for Zemberek, spaCy, and Stanza.
- Frequency statistics utilities for exploratory corpus analysis.

## Contributing

1. Create a virtual environment (`conda activate nlp.env` or `python -m venv .venv`).
2. Install development dependencies: `pip install -e .[dev]`.
3. Run the test suite: `pytest`.

Roadmap and task planning live in `ROADMAP.md`. Update the roadmap as you make progress if you want to contribute.

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "durak-nlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "turkish nlp, text processing, preprocessing, lemmatization, tokenization",
    "author": "Fatih Burak Karag\u00f6z",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9b/58/bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107/durak_nlp-0.1.0.tar.gz",
    "platform": null,
    "description": "# Durak\n\n<p align=\"center\">\n  <img src=\"docs/durak.svg\" alt=\"Durak logo\" width=\"200\" />\n</p>\n\nDurak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.\n\n- Personal homepage: [karagoz.io](https://karagoz.io)\n- Source repository: [github.com/fbkaragoz/durak](https://github.com/fbkaragoz/durak)\n\n## Getting Started\n\nDurak is under active development. The first public release will provide:\n\n- Unicode-aware cleaning functions tuned for Turkish data sources.\n- Tokenisation strategies ranging from regex to pluggable subword engines.\n- Stopword curation helpers with domain-specific override support.\n- Pluggable lemmatisation interface with adapters for Zemberek, spaCy, and Stanza.\n- Frequency statistics utilities for exploratory corpus analysis.\n\n## Contributing\n\n1. Create a virtual environment (`conda activate nlp.env` or `python -m venv .venv`).\n2. Install development dependencies: `pip install -e .[dev]`.\n3. Run the test suite: `pytest`.\n\nRoadmap and task planning live in `ROADMAP.md`. Update the roadmap as you make progress if you want to contribute.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Durak: modular Turkish NLP preprocessing toolkit.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://karagoz.io",
        "Issues": "https://github.com/fbkaragoz/durak/issues",
        "Repository": "https://github.com/fbkaragoz/durak"
    },
    "split_keywords": [
        "turkish nlp",
        " text processing",
        " preprocessing",
        " lemmatization",
        " tokenization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "62adcaa0dd8cf0e8874eaf3b60b480487ebf904cc09a129b1d29801667d719ea",
                "md5": "99094785d8670c7805652824c2d4c372",
                "sha256": "51855a54468e5f9cf5bc673cd6880f2c409df30d2079316bd105f40c3c7e95e3"
            },
            "downloads": -1,
            "filename": "durak_nlp-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99094785d8670c7805652824c2d4c372",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9223,
            "upload_time": "2025-10-19T14:11:52",
            "upload_time_iso_8601": "2025-10-19T14:11:52.709695Z",
            "url": "https://files.pythonhosted.org/packages/62/ad/caa0dd8cf0e8874eaf3b60b480487ebf904cc09a129b1d29801667d719ea/durak_nlp-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9b58bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107",
                "md5": "66f2a74abdc624ec028c16a8d5c5bc44",
                "sha256": "25e7245b43400f794ecaf7e875cb178990819faae7b95589dbbc966891b5d6e9"
            },
            "downloads": -1,
            "filename": "durak_nlp-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "66f2a74abdc624ec028c16a8d5c5bc44",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11776,
            "upload_time": "2025-10-19T14:11:54",
            "upload_time_iso_8601": "2025-10-19T14:11:54.723949Z",
            "url": "https://files.pythonhosted.org/packages/9b/58/bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107/durak_nlp-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-19 14:11:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fbkaragoz",
    "github_project": "durak",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "durak-nlp"
}

Fatih Burak Karagöz