# English Syllabifier (eng_syl)
This is a GRU-based neural network designed for English word syllabification. The model was trained on data from the [Wikimorph](https://link.springer.com/chapter/10.1007/978-3-030-78270-2_72) dataset.
## Usage
Use the `syllabify()` function from the `Syllabel` class to syllabify your words:
> >>> from eng_syl.syllabify import Syllabel
> >>> syllabler = Syllabel()
> >>> syllabler.syllabify("chomsky")
> 'chom-sky'
`syllabify()` parameters
- **text**: *string*- English text to be syllabified. Input should only contain alphabetic characters.
`syllabify()` returns the given word with hyphens inserted at syllable boundaries.
## Onceler (Onset, Nucleus, Coda Segmenter)
The `onc_split()` function from the `Onceler` class splits single syllables into their constituent Onset, Nucleus, and Coda components.
> >>> from eng_syl.onceler import Onceler
> >>> onc = Onceler()
> >>> print(onc.onc_split("schmear")
> 'schm-ea-r'
- **text**: *string* - English single syllable word/ component to be segmented into Onset, Nucleus, Coda. Input should only contain alphabetic characters.
## Phonify (Grapheme sequence to IPA estimation)
The `ipafy()` function from the `onc_to_phon` class tries to approximate an IPA pronunciation from a sequence of graphemes.
> >>> from eng_syl.phonify import onc_to_phon
> >>> otp = onc_to_phon()
> >>> print(otp.ipafy(['schm', 'ea', 'r'])
> ['ʃm', 'ɪ', 'r']
- **sequence**: *array of strings* - sa sequence of English viable onsets, nuclei, and coda
# 4.0.2 Notes
Fixed a typo in build_model(), where improper shape was being passed into Input()
Reverted class name from Syllabel -> Syllable -> Syllabel
# 4.0.3 Notes
Added handling for non-alpha characters in string; syllabify() won't break immediately if you pass a string like 'he23llotruc38k'. Instead, syllabify() syllabifies the string, ignoring non-alpha characters, and reinserts the non-alpha characters with hyphenation -> 'he23l-lo-truc38k'. This allows for handling of prehyphenated words like 'u-turn' -> 'u--turn'.
Also added an arg for returning the syllables as a list in syllabify(word, return_list = False). Should be capable of handling most strings now.
# 4.0.4 Notes
Added arg save_clean to syllabify(word, save_clean = True). When save_clean, new words will be saved to self.clean for future reference.
# 4.0.7 Notes
Added evaluate_english_validity(syllable) to Syllabel, which returns the onset nucleus coda decomposition split by hyphens if the syllable is likely English pronounceable, and False if unlikely.
Raw data
{
"_id": null,
"home_page": "https://github.com/timo-liu/eng-syl",
"name": "eng-syl",
"maintainer": null,
"docs_url": null,
"requires_python": null,
"maintainer_email": null,
"keywords": "Syllable, NLP, psycholinguistics",
"author": "Timothy-Liu",
"author_email": "timothys.new.email@gmail.com",
"download_url": null,
"platform": null,
"description": "\u00ef\u00bb\u00bf# English Syllabifier (eng_syl)\r\nThis is a GRU-based neural network designed for English word syllabification. The model was trained on data from the [Wikimorph](https://link.springer.com/chapter/10.1007/978-3-030-78270-2_72) dataset.\r\n\r\n## Usage\r\n\r\nUse the `syllabify()` function from the `Syllabel` class to syllabify your words:\r\n\r\n> >>> from eng_syl.syllabify import Syllabel\r\n> >>> syllabler = Syllabel()\r\n> >>> syllabler.syllabify(\"chomsky\")\r\n> 'chom-sky'\r\n\r\n`syllabify()` parameters\r\n\r\n - **text**: *string*- English text to be syllabified. Input should only contain alphabetic characters.\r\n\r\n`syllabify()` returns the given word with hyphens inserted at syllable boundaries.\r\n\r\n## Onceler (Onset, Nucleus, Coda Segmenter)\r\n\r\nThe `onc_split()` function from the `Onceler` class splits single syllables into their constituent Onset, Nucleus, and Coda components.\r\n\r\n> >>> from eng_syl.onceler import Onceler\r\n> >>> onc = Onceler()\r\n> >>> print(onc.onc_split(\"schmear\")\r\n> 'schm-ea-r'\r\n\r\n - **text**: *string* - English single syllable word/ component to be segmented into Onset, Nucleus, Coda. Input should only contain alphabetic characters.\r\n\r\n## Phonify (Grapheme sequence to IPA estimation)\r\n\r\nThe `ipafy()` function from the `onc_to_phon` class tries to approximate an IPA pronunciation from a sequence of graphemes.\r\n\r\n> >>> from eng_syl.phonify import onc_to_phon\r\n> >>> otp = onc_to_phon()\r\n> >>> print(otp.ipafy(['schm', 'ea', 'r'])\r\n> ['\u00ca\u0192m', '\u00c9\u00aa', 'r']\r\n\r\n - **sequence**: *array of strings* - sa sequence of English viable onsets, nuclei, and coda\r\n\r\n# 4.0.2 Notes\r\nFixed a typo in build_model(), where improper shape was being passed into Input()\r\nReverted class name from Syllabel -> Syllable -> Syllabel\r\n\r\n# 4.0.3 Notes\r\nAdded handling for non-alpha characters in string; syllabify() won't break immediately if you pass a string like 'he23llotruc38k'. Instead, syllabify() syllabifies the string, ignoring non-alpha characters, and reinserts the non-alpha characters with hyphenation -> 'he23l-lo-truc38k'. This allows for handling of prehyphenated words like 'u-turn' -> 'u--turn'.\r\nAlso added an arg for returning the syllables as a list in syllabify(word, return_list = False). Should be capable of handling most strings now.\r\n\r\n# 4.0.4 Notes\r\nAdded arg save_clean to syllabify(word, save_clean = True). When save_clean, new words will be saved to self.clean for future reference.\r\n\r\n# 4.0.7 Notes\r\nAdded evaluate_english_validity(syllable) to Syllabel, which returns the onset nucleus coda decomposition split by hyphens if the syllable is likely English pronounceable, and False if unlikely.\r\n",
"bugtrack_url": null,
"license": "MIT",
"summary": "English word syllabifier and extended syllable analysis tool",
"version": "4.0.7",
"project_urls": {
"Homepage": "https://github.com/timo-liu/eng-syl"
},
"split_keywords": [
"syllable",
" nlp",
" psycholinguistics"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "5782ffa8978dd507e92d3645d673847a7740b61d75a127cdc7672955db4983f4",
"md5": "acc9a489926813130e438745f6c33142",
"sha256": "3c5d2da4209a611d02e473074f600d98703d8bc29f2891ce0c5cd1022de80a27"
},
"downloads": -1,
"filename": "eng_syl-4.0.7-py3-none-any.whl",
"has_sig": false,
"md5_digest": "acc9a489926813130e438745f6c33142",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": null,
"size": 86356402,
"upload_time": "2025-01-06T19:51:29",
"upload_time_iso_8601": "2025-01-06T19:51:29.752381Z",
"url": "https://files.pythonhosted.org/packages/57/82/ffa8978dd507e92d3645d673847a7740b61d75a127cdc7672955db4983f4/eng_syl-4.0.7-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-01-06 19:51:29",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "timo-liu",
"github_project": "eng-syl",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "eng-syl"
}