truecase

Name	truecase JSON
Version	0.0.14 JSON
	download
home_page	https://github.com/daltonfury42/truecase
Summary	A library to restore capitalization for text
upload_time	2021-07-02 18:20:24
maintainer
docs_url	None
author	Dalton Fury
requires_python
license	MIT
keywords
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            # TrueCase


![Main](https://github.com/daltonfury42/truecase/workflows/Main/badge.svg) ![Publish PyPI](https://github.com/daltonfury42/truecase/workflows/Publish%20Python%20distributions%20to%20PyPI/badge.svg)

A language independent, statistical, language modeling
based tool in Python that restores case information for text.

The model was inspired by the paper of [Lucian Vlad Lita  et al., tRuEcasIng](https://www.cs.cmu.edu/~llita/papers/lita.truecasing-acl2003.pdf) but with some simplifications.


A model trained on NLTK English corpus comes with the package by default, 
and for other languages, a script is provided to create the model. This model is 
not perfect, train the system on a large and recent dataset to achieve 
the best results (e.g. on a recent dump of Wikipedia).

### Prerequisites

- Python 3

The project uses NLTK. Find install instructions [here](https://www.nltk.org/install.html).

### Installing

```bash
pip3 install truecase
```

## Usage

Simple usecase:

```python
>>> import truecase
>>> truecase.get_true_case('hey, what is the weather in new york?')
'Hey, what is the weather in New York?''
```

## Training your own model

TODO. For now refer to Trainer.py

## Contributing

I see a lot of space for improvement. Feel free to fork and improve. Do sent a pull request.

## Authors

* **Dalton Fury** - *Initial work* - [daltonfury42](https://github.com/daltonfury42)

## License

This project is licensed under the MIT License - see the [LICENSE.md](LICENSE) file for details

## Acknowledgments

* [Lucian Vlad Lita  et al., tRuEcasIng](https://www.cs.cmu.edu/~llita/papers/lita.truecasing-acl2003.pdf)
* Borrowed a lot of code, and the idea from [truecaser](https://github.com/nreimers/truecaser/blob/master/README.md) by [nreimers](https://github.com/nreimers)

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/daltonfury42/truecase",
    "name": "truecase",
    "maintainer": "",
    "docs_url": null,
    "requires_python": "",
    "maintainer_email": "",
    "keywords": "",
    "author": "Dalton Fury",
    "author_email": "daltonfury42@disroot.org",
    "download_url": "https://files.pythonhosted.org/packages/85/06/89d0adae754d32626dcd0dcd958a1b0be295a9e084a6ecda25af6ebbcdb2/truecase-0.0.14.tar.gz",
    "platform": "",
    "description": "# TrueCase\n\n\n![Main](https://github.com/daltonfury42/truecase/workflows/Main/badge.svg) ![Publish PyPI](https://github.com/daltonfury42/truecase/workflows/Publish%20Python%20distributions%20to%20PyPI/badge.svg)\n\nA language independent, statistical, language modeling\nbased tool in Python that restores case information for text.\n\nThe model was inspired by the paper of [Lucian Vlad Lita  et al., tRuEcasIng](https://www.cs.cmu.edu/~llita/papers/lita.truecasing-acl2003.pdf) but with some simplifications.\n\n\nA model trained on NLTK English corpus comes with the package by default, \nand for other languages, a script is provided to create the model. This model is \nnot perfect, train the system on a large and recent dataset to achieve \nthe best results (e.g. on a recent dump of Wikipedia).\n\n### Prerequisites\n\n- Python 3\n\nThe project uses NLTK. Find install instructions [here](https://www.nltk.org/install.html).\n\n### Installing\n\n```bash\npip3 install truecase\n```\n\n## Usage\n\nSimple usecase:\n\n```python\n>>> import truecase\n>>> truecase.get_true_case('hey, what is the weather in new york?')\n'Hey, what is the weather in New York?''\n```\n\n## Training your own model\n\nTODO. For now refer to Trainer.py\n\n## Contributing\n\nI see a lot of space for improvement. Feel free to fork and improve. Do sent a pull request.\n\n## Authors\n\n* **Dalton Fury** - *Initial work* - [daltonfury42](https://github.com/daltonfury42)\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE) file for details\n\n## Acknowledgments\n\n* [Lucian Vlad Lita  et al., tRuEcasIng](https://www.cs.cmu.edu/~llita/papers/lita.truecasing-acl2003.pdf)\n* Borrowed a lot of code, and the idea from [truecaser](https://github.com/nreimers/truecaser/blob/master/README.md) by [nreimers](https://github.com/nreimers)\n\n\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "A library to restore capitalization for text",
    "version": "0.0.14",
    "project_urls": {
        "Homepage": "https://github.com/daltonfury42/truecase"
    },
    "split_keywords": [],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "6aecca9dc9ab492aebc57af351709355d74d90e2b71c2b75befd2a1bf2c5db78",
                "md5": "c8cd5a9cb29c0859ba9af11c0d07c093",
                "sha256": "80e93b9d45a430d4bce4d9fe19fe0c185976ecf244779cf92e0901531ce86ced"
            },
            "downloads": -1,
            "filename": "truecase-0.0.14-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c8cd5a9cb29c0859ba9af11c0d07c093",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 28379904,
            "upload_time": "2021-07-02T18:20:19",
            "upload_time_iso_8601": "2021-07-02T18:20:19.886301Z",
            "url": "https://files.pythonhosted.org/packages/6a/ec/ca9dc9ab492aebc57af351709355d74d90e2b71c2b75befd2a1bf2c5db78/truecase-0.0.14-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "850689d0adae754d32626dcd0dcd958a1b0be295a9e084a6ecda25af6ebbcdb2",
                "md5": "ac240d326f37833a7276faff5e8e5b36",
                "sha256": "3a47b58c1724fcca7268cbeaf4056bb2c0cd041bd81f3b99a85ea263d7fc2d20"
            },
            "downloads": -1,
            "filename": "truecase-0.0.14.tar.gz",
            "has_sig": false,
            "md5_digest": "ac240d326f37833a7276faff5e8e5b36",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": null,
            "size": 28801090,
            "upload_time": "2021-07-02T18:20:24",
            "upload_time_iso_8601": "2021-07-02T18:20:24.114330Z",
            "url": "https://files.pythonhosted.org/packages/85/06/89d0adae754d32626dcd0dcd958a1b0be295a9e084a6ecda25af6ebbcdb2/truecase-0.0.14.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2021-07-02 18:20:24",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "daltonfury42",
    "github_project": "truecase",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [],
    "lcname": "truecase"
}

Dalton Fury