durak-nlp


Namedurak-nlp JSON
Version 0.1.0 PyPI version JSON
download
home_pageNone
SummaryDurak: modular Turkish NLP preprocessing toolkit.
upload_time2025-10-19 14:11:54
maintainerNone
docs_urlNone
authorFatih Burak Karagöz
requires_python>=3.9
licenseNone
keywords turkish nlp text processing preprocessing lemmatization tokenization
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # Durak

<p align="center">
  <img src="docs/durak.svg" alt="Durak logo" width="200" />
</p>

Durak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.

- Personal homepage: [karagoz.io](https://karagoz.io)
- Source repository: [github.com/fbkaragoz/durak](https://github.com/fbkaragoz/durak)

## Getting Started

Durak is under active development. The first public release will provide:

- Unicode-aware cleaning functions tuned for Turkish data sources.
- Tokenisation strategies ranging from regex to pluggable subword engines.
- Stopword curation helpers with domain-specific override support.
- Pluggable lemmatisation interface with adapters for Zemberek, spaCy, and Stanza.
- Frequency statistics utilities for exploratory corpus analysis.

## Contributing

1. Create a virtual environment (`conda activate nlp.env` or `python -m venv .venv`).
2. Install development dependencies: `pip install -e .[dev]`.
3. Run the test suite: `pytest`.

Roadmap and task planning live in `ROADMAP.md`. Update the roadmap as you make progress if you want to contribute.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "durak-nlp",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.9",
    "maintainer_email": null,
    "keywords": "turkish nlp, text processing, preprocessing, lemmatization, tokenization",
    "author": "Fatih Burak Karag\u00f6z",
    "author_email": null,
    "download_url": "https://files.pythonhosted.org/packages/9b/58/bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107/durak_nlp-0.1.0.tar.gz",
    "platform": null,
    "description": "# Durak\n\n<p align=\"center\">\n  <img src=\"docs/durak.svg\" alt=\"Durak logo\" width=\"200\" />\n</p>\n\nDurak is a Turkish natural language processing toolkit focused on reliable preprocessing building blocks. It offers configurable cleaning, tokenisation, stopword management, lemmatisation adapters, and frequency statistics so projects can bootstrap robust text pipelines quickly.\n\n- Personal homepage: [karagoz.io](https://karagoz.io)\n- Source repository: [github.com/fbkaragoz/durak](https://github.com/fbkaragoz/durak)\n\n## Getting Started\n\nDurak is under active development. The first public release will provide:\n\n- Unicode-aware cleaning functions tuned for Turkish data sources.\n- Tokenisation strategies ranging from regex to pluggable subword engines.\n- Stopword curation helpers with domain-specific override support.\n- Pluggable lemmatisation interface with adapters for Zemberek, spaCy, and Stanza.\n- Frequency statistics utilities for exploratory corpus analysis.\n\n## Contributing\n\n1. Create a virtual environment (`conda activate nlp.env` or `python -m venv .venv`).\n2. Install development dependencies: `pip install -e .[dev]`.\n3. Run the test suite: `pytest`.\n\nRoadmap and task planning live in `ROADMAP.md`. Update the roadmap as you make progress if you want to contribute.\n",
    "bugtrack_url": null,
    "license": null,
    "summary": "Durak: modular Turkish NLP preprocessing toolkit.",
    "version": "0.1.0",
    "project_urls": {
        "Homepage": "https://karagoz.io",
        "Issues": "https://github.com/fbkaragoz/durak/issues",
        "Repository": "https://github.com/fbkaragoz/durak"
    },
    "split_keywords": [
        "turkish nlp",
        " text processing",
        " preprocessing",
        " lemmatization",
        " tokenization"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "62adcaa0dd8cf0e8874eaf3b60b480487ebf904cc09a129b1d29801667d719ea",
                "md5": "99094785d8670c7805652824c2d4c372",
                "sha256": "51855a54468e5f9cf5bc673cd6880f2c409df30d2079316bd105f40c3c7e95e3"
            },
            "downloads": -1,
            "filename": "durak_nlp-0.1.0-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "99094785d8670c7805652824c2d4c372",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.9",
            "size": 9223,
            "upload_time": "2025-10-19T14:11:52",
            "upload_time_iso_8601": "2025-10-19T14:11:52.709695Z",
            "url": "https://files.pythonhosted.org/packages/62/ad/caa0dd8cf0e8874eaf3b60b480487ebf904cc09a129b1d29801667d719ea/durak_nlp-0.1.0-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "9b58bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107",
                "md5": "66f2a74abdc624ec028c16a8d5c5bc44",
                "sha256": "25e7245b43400f794ecaf7e875cb178990819faae7b95589dbbc966891b5d6e9"
            },
            "downloads": -1,
            "filename": "durak_nlp-0.1.0.tar.gz",
            "has_sig": false,
            "md5_digest": "66f2a74abdc624ec028c16a8d5c5bc44",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.9",
            "size": 11776,
            "upload_time": "2025-10-19T14:11:54",
            "upload_time_iso_8601": "2025-10-19T14:11:54.723949Z",
            "url": "https://files.pythonhosted.org/packages/9b/58/bebd1956603f245e4311f51c55ac2a07b0a4f0cb3e2fa89639b099a08107/durak_nlp-0.1.0.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-10-19 14:11:54",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "fbkaragoz",
    "github_project": "durak",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "durak-nlp"
}
        
Elapsed time: 1.65780s