eng-syl


Nameeng-syl JSON
Version 4.0.7 PyPI version JSON
download
home_pagehttps://github.com/timo-liu/eng-syl
SummaryEnglish word syllabifier and extended syllable analysis tool
upload_time2025-01-06 19:51:29
maintainerNone
docs_urlNone
authorTimothy-Liu
requires_pythonNone
licenseMIT
keywords syllable nlp psycholinguistics
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # English Syllabifier (eng_syl)
This is a GRU-based neural network designed for English word syllabification. The model was trained on data from the  [Wikimorph](https://link.springer.com/chapter/10.1007/978-3-030-78270-2_72) dataset.

## Usage

Use the `syllabify()` function from the `Syllabel` class to syllabify your words:

>     >>> from eng_syl.syllabify import Syllabel
>     >>> syllabler = Syllabel()
>     >>> syllabler.syllabify("chomsky")
>     'chom-sky'

`syllabify()` parameters

 - **text**: *string*- English text to be syllabified. Input should only contain alphabetic characters.

`syllabify()` returns the given word with hyphens inserted at syllable boundaries.

## Onceler (Onset, Nucleus, Coda Segmenter)

The `onc_split()` function from the  `Onceler` class splits single syllables into their constituent Onset, Nucleus, and Coda components.

>     >>> from eng_syl.onceler import Onceler
>     >>> onc = Onceler()
>     >>> print(onc.onc_split("schmear")
>     'schm-ea-r'

 - **text**: *string* - English single syllable word/ component to be segmented into Onset, Nucleus, Coda. Input should only contain alphabetic characters.

## Phonify (Grapheme sequence to IPA estimation)

The `ipafy()` function from the  `onc_to_phon` class tries to approximate an IPA pronunciation from a sequence of graphemes.

>     >>> from eng_syl.phonify import onc_to_phon
>     >>> otp = onc_to_phon()
>     >>> print(otp.ipafy(['schm', 'ea', 'r'])
>     ['ʃm', 'ɪ', 'r']

 - **sequence**: *array of strings* - sa sequence of English viable onsets, nuclei, and coda

# 4.0.2 Notes
Fixed a typo in build_model(), where improper shape was being passed into Input()
Reverted class name from Syllabel -> Syllable -> Syllabel

# 4.0.3 Notes
Added handling for non-alpha characters in string; syllabify() won't break immediately if you pass a string like 'he23llotruc38k'. Instead, syllabify() syllabifies the string, ignoring non-alpha characters, and reinserts the non-alpha characters with hyphenation -> 'he23l-lo-truc38k'. This allows for handling of prehyphenated words like 'u-turn' -> 'u--turn'.
Also added an arg for returning the syllables as a list in syllabify(word, return_list = False). Should be capable of handling most strings now.

# 4.0.4 Notes
Added arg save_clean to syllabify(word, save_clean = True). When save_clean, new words will be saved to self.clean for future reference.

# 4.0.7 Notes
Added evaluate_english_validity(syllable) to Syllabel, which returns the onset nucleus coda decomposition split by hyphens if the syllable is likely English pronounceable, and False if unlikely.

            

Raw data

            {
    "_id": null,
    "home_page": "https://github.com/timo-liu/eng-syl",
    "name": "eng-syl",
    "maintainer": null,
    "docs_url": null,
    "requires_python": null,
    "maintainer_email": null,
    "keywords": "Syllable, NLP, psycholinguistics",
    "author": "Timothy-Liu",
    "author_email": "timothys.new.email@gmail.com",
    "download_url": null,
    "platform": null,
    "description": "\u00ef\u00bb\u00bf# English Syllabifier (eng_syl)\r\nThis is a GRU-based neural network designed for English word syllabification. The model was trained on data from the  [Wikimorph](https://link.springer.com/chapter/10.1007/978-3-030-78270-2_72) dataset.\r\n\r\n## Usage\r\n\r\nUse the `syllabify()` function from the `Syllabel` class to syllabify your words:\r\n\r\n>     >>> from eng_syl.syllabify import Syllabel\r\n>     >>> syllabler = Syllabel()\r\n>     >>> syllabler.syllabify(\"chomsky\")\r\n>     'chom-sky'\r\n\r\n`syllabify()` parameters\r\n\r\n - **text**: *string*- English text to be syllabified. Input should only contain alphabetic characters.\r\n\r\n`syllabify()` returns the given word with hyphens inserted at syllable boundaries.\r\n\r\n## Onceler (Onset, Nucleus, Coda Segmenter)\r\n\r\nThe `onc_split()` function from the  `Onceler` class splits single syllables into their constituent Onset, Nucleus, and Coda components.\r\n\r\n>     >>> from eng_syl.onceler import Onceler\r\n>     >>> onc = Onceler()\r\n>     >>> print(onc.onc_split(\"schmear\")\r\n>     'schm-ea-r'\r\n\r\n - **text**: *string* - English single syllable word/ component to be segmented into Onset, Nucleus, Coda. Input should only contain alphabetic characters.\r\n\r\n## Phonify (Grapheme sequence to IPA estimation)\r\n\r\nThe `ipafy()` function from the  `onc_to_phon` class tries to approximate an IPA pronunciation from a sequence of graphemes.\r\n\r\n>     >>> from eng_syl.phonify import onc_to_phon\r\n>     >>> otp = onc_to_phon()\r\n>     >>> print(otp.ipafy(['schm', 'ea', 'r'])\r\n>     ['\u00ca\u0192m', '\u00c9\u00aa', 'r']\r\n\r\n - **sequence**: *array of strings* - sa sequence of English viable onsets, nuclei, and coda\r\n\r\n# 4.0.2 Notes\r\nFixed a typo in build_model(), where improper shape was being passed into Input()\r\nReverted class name from Syllabel -> Syllable -> Syllabel\r\n\r\n# 4.0.3 Notes\r\nAdded handling for non-alpha characters in string; syllabify() won't break immediately if you pass a string like 'he23llotruc38k'. Instead, syllabify() syllabifies the string, ignoring non-alpha characters, and reinserts the non-alpha characters with hyphenation -> 'he23l-lo-truc38k'. This allows for handling of prehyphenated words like 'u-turn' -> 'u--turn'.\r\nAlso added an arg for returning the syllables as a list in syllabify(word, return_list = False). Should be capable of handling most strings now.\r\n\r\n# 4.0.4 Notes\r\nAdded arg save_clean to syllabify(word, save_clean = True). When save_clean, new words will be saved to self.clean for future reference.\r\n\r\n# 4.0.7 Notes\r\nAdded evaluate_english_validity(syllable) to Syllabel, which returns the onset nucleus coda decomposition split by hyphens if the syllable is likely English pronounceable, and False if unlikely.\r\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "English word syllabifier and extended syllable analysis tool",
    "version": "4.0.7",
    "project_urls": {
        "Homepage": "https://github.com/timo-liu/eng-syl"
    },
    "split_keywords": [
        "syllable",
        " nlp",
        " psycholinguistics"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "5782ffa8978dd507e92d3645d673847a7740b61d75a127cdc7672955db4983f4",
                "md5": "acc9a489926813130e438745f6c33142",
                "sha256": "3c5d2da4209a611d02e473074f600d98703d8bc29f2891ce0c5cd1022de80a27"
            },
            "downloads": -1,
            "filename": "eng_syl-4.0.7-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "acc9a489926813130e438745f6c33142",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": null,
            "size": 86356402,
            "upload_time": "2025-01-06T19:51:29",
            "upload_time_iso_8601": "2025-01-06T19:51:29.752381Z",
            "url": "https://files.pythonhosted.org/packages/57/82/ffa8978dd507e92d3645d673847a7740b61d75a127cdc7672955db4983f4/eng_syl-4.0.7-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-01-06 19:51:29",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "timo-liu",
    "github_project": "eng-syl",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": false,
    "lcname": "eng-syl"
}
        
Elapsed time: 0.43301s