| Name | somelang JSON |
| Version |
0.0.3
JSON |
| download |
| home_page | https://github.com/SomeAB/somelang |
| Summary | Language Detection Library |
| upload_time | 2025-09-05 10:23:34 |
| maintainer | None |
| docs_url | None |
| author | SomeAB |
| requires_python | >=3.8 |
| license | MIT License
Copyright (c) 2025 SomeAB
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
|
| keywords |
language detection
nlp
text analysis
linguistics
|
| VCS |
 |
| bugtrack_url |
|
| requirements |
No requirements were recorded.
|
| Travis-CI |
No Travis.
|
| coveralls test coverage |
No coveralls.
|
# SomeLang
## Natural Language Detection Library
SomeLang is a lightweight and decently accurate natural language detection library. It is designed to be fast, python native, with no external dependencies for the main script, and highly customizable with support for whitelists and blacklists.
## Installation
```bash
pip install somelang
```
## Features
- **Fast Natural Language Detection** - Trigrams-based approach for accurate results
- **Default 158+ language whitelist** - The default whitelist provides better accuracy on short texts (3-100 characters)
- **Supports 194+ languages** - Can detect a wide range of languages in full mode
- **Modern Training Data** - Trained on OpenLID-v2 & many other modern datasets
- **Python-native** - No external dependencies for main script
- **Customizable** - Configurable whitelist/blacklist support
## Usage
### Basic Detection
```python
from somelang import somelang
# Basic language detection
lang = somelang("Bonjour tout le monde") # Returns: 'fra'
# Get language name instead of code
lang = somelang("Hello world", verbose=True) # Returns: 'English'
```
### Command Line
```python
python -m somelang 'text to analyze'
```
### Advanced Usage
```python
from somelang import somelang_all, somelang_no_whitelist
# Get all probable languages with confidence scores
results = somelang_all("Hello world") # Returns: [['eng', 1.0], ...]
# Use all 194 languages (no whitelist)
lang = somelang_no_whitelist("Text in rare language")
```
### Note
```
Currently, the library expects a minimum text length of 10 characters, but due to the current trigram-based approach, it may give a false positive on less than 100 character texts. This will be remedied in future updates.
```
## Citations
Trained mainly on the [OpenLID-v2 dataset](https://huggingface.co/datasets/laurievb/OpenLID-v2) and a few other datasets (for refinement).
Inspired by [franc](https://github.com/wooorm/franc) by [Titus Wormer](https://github.com/wooorm).
See [CITATIONS](./CITATIONS.md) file for more details.
## License
This project is licensed under the [MIT](./LICENSE) license. Authored by [SomeAB](https://github.com/SomeAB).
Raw data
{
"_id": null,
"home_page": "https://github.com/SomeAB/somelang",
"name": "somelang",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "language detection, nlp, text analysis, linguistics",
"author": "SomeAB",
"author_email": "SomeAB <ssabs@protonmail.com>",
"download_url": "https://files.pythonhosted.org/packages/16/ce/3acc9d84bc48fe1646b0436d4d87ccbae7296b3463884ffb591c3f48286e/somelang-0.0.3.tar.gz",
"platform": null,
"description": "# SomeLang\r\n\r\n## Natural Language Detection Library\r\n\r\nSomeLang is a lightweight and decently accurate natural language detection library. It is designed to be fast, python native, with no external dependencies for the main script, and highly customizable with support for whitelists and blacklists.\r\n\r\n## Installation\r\n\r\n```bash\r\npip install somelang\r\n```\r\n\r\n## Features\r\n\r\n- **Fast Natural Language Detection** - Trigrams-based approach for accurate results\r\n- **Default 158+ language whitelist** - The default whitelist provides better accuracy on short texts (3-100 characters)\r\n- **Supports 194+ languages** - Can detect a wide range of languages in full mode\r\n- **Modern Training Data** - Trained on OpenLID-v2 & many other modern datasets\r\n- **Python-native** - No external dependencies for main script\r\n- **Customizable** - Configurable whitelist/blacklist support\r\n\r\n## Usage\r\n\r\n### Basic Detection\r\n```python\r\nfrom somelang import somelang\r\n\r\n# Basic language detection\r\nlang = somelang(\"Bonjour tout le monde\") # Returns: 'fra'\r\n\r\n# Get language name instead of code\r\nlang = somelang(\"Hello world\", verbose=True) # Returns: 'English'\r\n```\r\n\r\n### Command Line\r\n```python\r\npython -m somelang 'text to analyze'\r\n```\r\n\r\n### Advanced Usage\r\n```python\r\n\r\nfrom somelang import somelang_all, somelang_no_whitelist\r\n\r\n# Get all probable languages with confidence scores\r\nresults = somelang_all(\"Hello world\") # Returns: [['eng', 1.0], ...]\r\n\r\n# Use all 194 languages (no whitelist)\r\nlang = somelang_no_whitelist(\"Text in rare language\")\r\n```\r\n\r\n### Note\r\n```\r\nCurrently, the library expects a minimum text length of 10 characters, but due to the current trigram-based approach, it may give a false positive on less than 100 character texts. This will be remedied in future updates.\r\n```\r\n\r\n## Citations \r\nTrained mainly on the [OpenLID-v2 dataset](https://huggingface.co/datasets/laurievb/OpenLID-v2) and a few other datasets (for refinement). \r\n\r\nInspired by [franc](https://github.com/wooorm/franc) by [Titus Wormer](https://github.com/wooorm).\r\n\r\nSee [CITATIONS](./CITATIONS.md) file for more details.\r\n\r\n## License\r\nThis project is licensed under the [MIT](./LICENSE) license. Authored by [SomeAB](https://github.com/SomeAB).\r\n\r\n",
"bugtrack_url": null,
"license": "MIT License\r\n \r\n Copyright (c) 2025 SomeAB\r\n \r\n Permission is hereby granted, free of charge, to any person obtaining a copy\r\n of this software and associated documentation files (the \"Software\"), to deal\r\n in the Software without restriction, including without limitation the rights\r\n to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\r\n copies of the Software, and to permit persons to whom the Software is\r\n furnished to do so, subject to the following conditions:\r\n \r\n The above copyright notice and this permission notice shall be included in all\r\n copies or substantial portions of the Software.\r\n \r\n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\r\n IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\r\n FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\r\n AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\r\n LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\r\n OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\r\n SOFTWARE.\r\n ",
"summary": "Language Detection Library",
"version": "0.0.3",
"project_urls": {
"Bug Reports": "https://github.com/SomeAB/somelang/issues",
"Homepage": "https://github.com/SomeAB/somelang",
"Source": "https://github.com/SomeAB/somelang"
},
"split_keywords": [
"language detection",
" nlp",
" text analysis",
" linguistics"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "71091580170b1de4cb3fda6b09cc0c99cefdd86f6b08575ffa760d7a80347f17",
"md5": "ba2d5964d192adca0739275ef1b10a6f",
"sha256": "c56a134ad17a763abede6d37440e2e9787817e58bef16429914a8570fbfd5dfb"
},
"downloads": -1,
"filename": "somelang-0.0.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "ba2d5964d192adca0739275ef1b10a6f",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.8",
"size": 293198,
"upload_time": "2025-09-05T10:23:32",
"upload_time_iso_8601": "2025-09-05T10:23:32.711898Z",
"url": "https://files.pythonhosted.org/packages/71/09/1580170b1de4cb3fda6b09cc0c99cefdd86f6b08575ffa760d7a80347f17/somelang-0.0.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "16ce3acc9d84bc48fe1646b0436d4d87ccbae7296b3463884ffb591c3f48286e",
"md5": "a11f280f95760277b7eef0fa4ff0199c",
"sha256": "1047edc41d763cbf7085d4e0a7ce75724fee5e47d58de3e96fa3f58cdaa490c8"
},
"downloads": -1,
"filename": "somelang-0.0.3.tar.gz",
"has_sig": false,
"md5_digest": "a11f280f95760277b7eef0fa4ff0199c",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 288609,
"upload_time": "2025-09-05T10:23:34",
"upload_time_iso_8601": "2025-09-05T10:23:34.130832Z",
"url": "https://files.pythonhosted.org/packages/16/ce/3acc9d84bc48fe1646b0436d4d87ccbae7296b3463884ffb591c3f48286e/somelang-0.0.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-05 10:23:34",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "SomeAB",
"github_project": "somelang",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "somelang"
}