abclf


Nameabclf JSON
Version 1.0.1 PyPI version JSON
download
home_page
SummaryClassify American vs. British English
upload_time2023-03-21 10:52:29
maintainer
docs_urlNone
author
requires_python>=3.7
license
keywords english variant american british
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # American-British-variety-classifier

A minimalistic spelling- and vocabulary-based American-vs-British variety classifier.

## Description

This classifier is based on the [VarCon database](http://wordlist.aspell.net/varcon/). First the database was read and used in its entirety, after which progressive prunings were performed to improve performance.


The classifier performs rudimentary preprocessing (some weird character deletion to reduce the odds of discarding non-important words) and then checks all lowercase nonnumerical words if they are present in the dictionary. The final step is assigning variety to the input text, for which we use the following logic:
* for documents with no identified American or British lexemes it returns `UNK`, 
* if one variant has more than twice as many identified words as the other, it classifies the instance as the more frequent variant,
* else it classifies it as `MIX`.


## Installation

```bash
pip install abclf
```

## Use

```python
import abclf
text = "The flautist heard a rumour about the gray haired clarinettist in a wollen pullover"
abclf.get_variant(text)

# 'B'
```
## Authors

* Peter Rupnik
* Taja Kuzman
* Nikola Ljubešić



## Copyright of the original VarCon database

```
Copyright 2000-2020 by Kevin Atkinson (kevina@gnu.org) and Benjamin
Titze (btitze@protonmail.ch).

Copyright 2000-2019 by Kevin Atkinson

Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
granted without fee, provided that the above copyright notice appears
in all copies and that both that copyright notice and this permission
notice appear in supporting documentation. Kevin Atkinson makes no
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.

Copyright 2016 by Benjamin Titze

Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
granted without fee, provided that the above copyright notice appears
in all copies and that both that copyright notice and this permission
notice appear in supporting documentation. Benjamin Titze makes no
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.

Since the original words lists come from the Ispell distribution:

Copyright 1993, Geoff Kuenning, Granada Hills, CA
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
3. All modifications to the source code must be clearly marked as
   such.  Binary redistributions based on modified source code
   must be clearly marked as modified versions in the documentation
   and/or other materials provided with the distribution.
(clause 4 removed with permission from Geoff Kuenning)
5. The name of Geoff Kuenning may not be used to endorse or promote
   products derived from this software without specific prior
   written permission.

THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED.  IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

```

            

Raw data

            {
    "_id": null,
    "home_page": "",
    "name": "abclf",
    "maintainer": "",
    "docs_url": null,
    "requires_python": ">=3.7",
    "maintainer_email": "",
    "keywords": "English variant,American,British",
    "author": "",
    "author_email": "Peter Rupnik <peter.rupnik@ijs.si>",
    "download_url": "https://files.pythonhosted.org/packages/84/9d/320748f8c4a03df3e25201de7ab4ff74ae0f9a57b3a0428846fb975fc431/abclf-1.0.1.tar.gz",
    "platform": null,
    "description": "# American-British-variety-classifier\n\nA minimalistic spelling- and vocabulary-based American-vs-British variety classifier.\n\n## Description\n\nThis classifier is based on the [VarCon database](http://wordlist.aspell.net/varcon/). First the database was read and used in its entirety, after which progressive prunings were performed to improve performance.\n\n\nThe classifier performs rudimentary preprocessing (some weird character deletion to reduce the odds of discarding non-important words) and then checks all lowercase nonnumerical words if they are present in the dictionary. The final step is assigning variety to the input text, for which we use the following logic:\n* for documents with no identified American or British lexemes it returns `UNK`, \n* if one variant has more than twice as many identified words as the other, it classifies the instance as the more frequent variant,\n* else it classifies it as `MIX`.\n\n\n## Installation\n\n```bash\npip install abclf\n```\n\n## Use\n\n```python\nimport abclf\ntext = \"The flautist heard a rumour about the gray haired clarinettist in a wollen pullover\"\nabclf.get_variant(text)\n\n# 'B'\n```\n## Authors\n\n* Peter Rupnik\n* Taja Kuzman\n* Nikola Ljube\u0161i\u0107\n\n\n\n## Copyright of the original VarCon database\n\n```\nCopyright 2000-2020 by Kevin Atkinson (kevina@gnu.org) and Benjamin\nTitze (btitze@protonmail.ch).\n\nCopyright 2000-2019 by Kevin Atkinson\n\nPermission to use, copy, modify, distribute and sell this array, the\nassociated software, and its documentation for any purpose is hereby\ngranted without fee, provided that the above copyright notice appears\nin all copies and that both that copyright notice and this permission\nnotice appear in supporting documentation. Kevin Atkinson makes no\nrepresentations about the suitability of this array for any\npurpose. It is provided \"as is\" without express or implied warranty.\n\nCopyright 2016 by Benjamin Titze\n\nPermission to use, copy, modify, distribute and sell this array, the\nassociated software, and its documentation for any purpose is hereby\ngranted without fee, provided that the above copyright notice appears\nin all copies and that both that copyright notice and this permission\nnotice appear in supporting documentation. Benjamin Titze makes no\nrepresentations about the suitability of this array for any\npurpose. It is provided \"as is\" without express or implied warranty.\n\nSince the original words lists come from the Ispell distribution:\n\nCopyright 1993, Geoff Kuenning, Granada Hills, CA\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions\nare met:\n\n1. Redistributions of source code must retain the above copyright\n   notice, this list of conditions and the following disclaimer.\n2. Redistributions in binary form must reproduce the above copyright\n   notice, this list of conditions and the following disclaimer in the\n   documentation and/or other materials provided with the distribution.\n3. All modifications to the source code must be clearly marked as\n   such.  Binary redistributions based on modified source code\n   must be clearly marked as modified versions in the documentation\n   and/or other materials provided with the distribution.\n(clause 4 removed with permission from Geoff Kuenning)\n5. The name of Geoff Kuenning may not be used to endorse or promote\n   products derived from this software without specific prior\n   written permission.\n\nTHIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND\nANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\nIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\nARE DISCLAIMED.  IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE\nFOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\nDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS\nOR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)\nHOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT\nLIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY\nOUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF\nSUCH DAMAGE.\n\n```\n",
    "bugtrack_url": null,
    "license": "",
    "summary": "Classify American vs. British English",
    "version": "1.0.1",
    "split_keywords": [
        "english variant",
        "american",
        "british"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5f5dd517ff20eb2b3b2b8951fef2fb9d045475baabc554dea62e81f47ad4f6f7",
                "md5": "aee5f033293ff95bc3753a148bc0ab83",
                "sha256": "063b050f1b000e5e28a02b57bc7dd22a4f056c36832080e0f8a79697a4e3c643"
            },
            "downloads": -1,
            "filename": "abclf-1.0.1-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "aee5f033293ff95bc3753a148bc0ab83",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.7",
            "size": 54425,
            "upload_time": "2023-03-21T10:52:25",
            "upload_time_iso_8601": "2023-03-21T10:52:25.624574Z",
            "url": "https://files.pythonhosted.org/packages/5f/5d/d517ff20eb2b3b2b8951fef2fb9d045475baabc554dea62e81f47ad4f6f7/abclf-1.0.1-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "849d320748f8c4a03df3e25201de7ab4ff74ae0f9a57b3a0428846fb975fc431",
                "md5": "e624da34d921d0906da334d79f3ff5d1",
                "sha256": "fd520ca76b3aaff8ca6bcc99c91fdb3d04e84ea4adf763ff6e58037aa3da9078"
            },
            "downloads": -1,
            "filename": "abclf-1.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "e624da34d921d0906da334d79f3ff5d1",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.7",
            "size": 56266,
            "upload_time": "2023-03-21T10:52:29",
            "upload_time_iso_8601": "2023-03-21T10:52:29.036376Z",
            "url": "https://files.pythonhosted.org/packages/84/9d/320748f8c4a03df3e25201de7ab4ff74ae0f9a57b3a0428846fb975fc431/abclf-1.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2023-03-21 10:52:29",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "lcname": "abclf"
}
        
Elapsed time: 0.06120s