<p>
<a href="https://github.com/seanox/seanox-ai-nlp/pulls"
title="Development"
><img src="https://img.shields.io/badge/development-active-green?style=for-the-badge"
></a>
<a href="https://github.com/seanox/seanox-ai-nlp/issues"
><img src="https://img.shields.io/badge/maintenance-active-green?style=for-the-badge"
></a>
<a href="https://seanox.com/contact"
><img src="https://img.shields.io/badge/support-active-green?style=for-the-badge"
></a>
</p>
# Description
This package addresses challenges in semantic processing of domain-specific
content within NLP pipelines. It aims to improve the connection between semantic
user queries and structured, technically rich data—especially where traditional
embedding models and similarity metrics reach their limits.
The approach combines:
- __Token-sensitive preprocessing__ for better contextual understanding
- __Rule-based enhancements__ to detect technical terms and units
- __Modular components__ that integrate easily into existing retrieval systems
The package attempts to support existing NLP workflows through lightweight
components designed to better handle domain-specific terminology, structured
data, and semantic matching.
Further modules are planned to extend the package's capabilities, including:
- __Sentence Generator__: For creating synthetic, recombinable training data to
support model fine-tuning
- __Logic Query Composer__: For transforming natural-language queries into
structured formats (e.g. SQL, JSON, YAML, etc.)
# Licence Agreement
LIZENZBEDINGUNGEN - Seanox Software Solutions ist ein Open-Source-Projekt, im
Folgenden Seanox Software Solutions oder kurz Seanox genannt.
Diese Software unterliegt der Version 2 der Apache License.
Copyright (C) 2025 Seanox Software Solutions
Licensed under the Apache License, Version 2.0 (the "License"); you may not use
this file except in compliance with the License. You may obtain a copy of the
License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
# System Requirement
- Python 3.9 or higher
# Installation & Setup
```
pip install seanox-ai-nlp
```
# Packages & Modules
## [units](seanox_ai_nlp/units/README.md)
The units module uses __rule-based__, __deterministic pattern recognition__ to
identify numerical expressions and units in text. It does not rely on __large
language models (LLMs)__ and is suitable for integration into __lightweight NLP
pipelines__. Its language-agnostic design and adaptable formatting support a
wide range of applications, including general, semi-technical, and semi-academic
content. The module can be integrated with tools like spaCy’s __EntityRuler__,
supporting __annotation__, __filtering__, and __token alignment__ workflows with
structured output for downstream semantic analysis -- without performing it
itself.
### Features
- __Pattern-based extraction__
Identifies constructs like _5 km_, _-20 ºC_, or _1000 hPa_ using regular
expressions and token patterns -- no training required.
- __Language-independent architecture__
Operates at token and character level, making it effective across multilingual
content.
- __Support for compound expressions__
Recognizes both unit combinations (_km/h, kWh/m², g/cm³_) and
numerical constructs using signs and operators: _±, ×, ·,
:, /, ^, –_ and more.
- __Integration-ready output__
Returns structured results compatible with tools like spaCy’s EntityRuler for
use in pipelines.
- __Transparent design__
Fully interpretable and deterministic -- avoids black-box ML, supporting
reliable and auditable processing.
### Quickstart
- [Usage](seanox_ai_nlp/units/README.md#usage)
- [Integration in NLP Workflows](
seanox_ai_nlp/units/README.md#integration-in-nlp-workflows)
- [Downstream Processing with pandas](
seanox_ai_nlp/units/README.md#downstream-processing-with-pandas)
# Changes
TODO:
# Contact
[Issues](https://github.com/seanox/seanox-ai-nlp/issues)
[Requests](https://github.com/seanox/seanox-ai-nlp/pulls)
[Mail](https://seanox.com/contact)
Raw data
{
"_id": null,
"home_page": "https://github.com/seanox/seanox-ai-nlp",
"name": "seanox-ai-nlp",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.9",
"maintainer_email": null,
"keywords": "NLP, semantic, units, domain-specific, text processing, information extraction",
"author": "Seanox Software Solutions",
"author_email": null,
"download_url": "https://files.pythonhosted.org/packages/3b/26/b0def09f48bedcda76d0a1759dba378ced9c2459562f0cb935de23c8d3b9/seanox_ai_nlp-1.0.0.tar.gz",
"platform": null,
"description": "<p>\r\n <a href=\"https://github.com/seanox/seanox-ai-nlp/pulls\"\r\n title=\"Development\"\r\n ><img src=\"https://img.shields.io/badge/development-active-green?style=for-the-badge\"\r\n ></a> \r\n <a href=\"https://github.com/seanox/seanox-ai-nlp/issues\"\r\n ><img src=\"https://img.shields.io/badge/maintenance-active-green?style=for-the-badge\"\r\n ></a>\r\n <a href=\"https://seanox.com/contact\"\r\n ><img src=\"https://img.shields.io/badge/support-active-green?style=for-the-badge\"\r\n ></a>\r\n</p>\r\n\r\n# Description\r\n\r\nThis package addresses challenges in semantic processing of domain-specific\r\ncontent within NLP pipelines. It aims to improve the connection between semantic\r\nuser queries and structured, technically rich data\u2014especially where traditional\r\nembedding models and similarity metrics reach their limits.\r\n\r\nThe approach combines:\r\n\r\n- __Token-sensitive preprocessing__ for better contextual understanding\r\n- __Rule-based enhancements__ to detect technical terms and units\r\n- __Modular components__ that integrate easily into existing retrieval systems\r\n\r\nThe package attempts to support existing NLP workflows through lightweight\r\ncomponents designed to better handle domain-specific terminology, structured\r\ndata, and semantic matching.\r\n\r\nFurther modules are planned to extend the package's capabilities, including:\r\n\r\n- __Sentence Generator__: For creating synthetic, recombinable training data to\r\n support model fine-tuning\r\n- __Logic Query Composer__: For transforming natural-language queries into\r\n structured formats (e.g. SQL, JSON, YAML, etc.)\r\n\r\n# Licence Agreement\r\n\r\nLIZENZBEDINGUNGEN - Seanox Software Solutions ist ein Open-Source-Projekt, im\r\nFolgenden Seanox Software Solutions oder kurz Seanox genannt.\r\n\r\nDiese Software unterliegt der Version 2 der Apache License.\r\n\r\nCopyright (C) 2025 Seanox Software Solutions\r\n\r\nLicensed under the Apache License, Version 2.0 (the \"License\"); you may not use\r\nthis file except in compliance with the License. You may obtain a copy of the\r\nLicense at\r\n\r\nhttps://www.apache.org/licenses/LICENSE-2.0\r\n\r\nUnless required by applicable law or agreed to in writing, software distributed\r\nunder the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR\r\nCONDITIONS OF ANY KIND, either express or implied. See the License for the\r\nspecific language governing permissions and limitations under the License.\r\n\r\n# System Requirement\r\n\r\n- Python 3.9 or higher\r\n\r\n# Installation & Setup\r\n\r\n```\r\npip install seanox-ai-nlp\r\n```\r\n\r\n# Packages & Modules\r\n\r\n## [units](seanox_ai_nlp/units/README.md)\r\n\r\nThe units module uses __rule-based__, __deterministic pattern recognition__ to\r\nidentify numerical expressions and units in text. It does not rely on __large\r\nlanguage models (LLMs)__ and is suitable for integration into __lightweight NLP\r\npipelines__. Its language-agnostic design and adaptable formatting support a\r\nwide range of applications, including general, semi-technical, and semi-academic\r\ncontent. The module can be integrated with tools like spaCy\u2019s __EntityRuler__,\r\nsupporting __annotation__, __filtering__, and __token alignment__ workflows with\r\nstructured output for downstream semantic analysis -- without performing it\r\nitself.\r\n\r\n### Features\r\n\r\n- __Pattern-based extraction__ \r\n Identifies constructs like _5 km_, _-20 ºC_, or _1000 hPa_ using regular\r\n expressions and token patterns -- no training required.\r\n \r\n- __Language-independent architecture__ \r\n Operates at token and character level, making it effective across multilingual\r\n content.\r\n \r\n- __Support for compound expressions__ \r\n Recognizes both unit combinations (_km/h, kWh/m², g/cm³_) and\r\n numerical constructs using signs and operators: _±, ×, ·,\r\n :, /, ^, \u2013_ and more.\r\n \r\n- __Integration-ready output__ \r\n Returns structured results compatible with tools like spaCy\u2019s EntityRuler for\r\n use in pipelines.\r\n \r\n- __Transparent design__ \r\n Fully interpretable and deterministic -- avoids black-box ML, supporting\r\n reliable and auditable processing.\r\n\r\n### Quickstart\r\n\r\n- [Usage](seanox_ai_nlp/units/README.md#usage)\r\n- [Integration in NLP Workflows](\r\n seanox_ai_nlp/units/README.md#integration-in-nlp-workflows)\r\n- [Downstream Processing with pandas](\r\n seanox_ai_nlp/units/README.md#downstream-processing-with-pandas)\r\n\r\n# Changes\r\n\r\nTODO:\r\n\r\n# Contact\r\n\r\n[Issues](https://github.com/seanox/seanox-ai-nlp/issues) \r\n[Requests](https://github.com/seanox/seanox-ai-nlp/pulls) \r\n[Mail](https://seanox.com/contact)\r\n",
"bugtrack_url": null,
"license": "Apache-2.0",
"summary": "Lightweight NLP components for semantic processing of domain-specific content.",
"version": "1.0.0",
"project_urls": {
"Homepage": "https://github.com/seanox/seanox-ai-nlp"
},
"split_keywords": [
"nlp",
" semantic",
" units",
" domain-specific",
" text processing",
" information extraction"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "38168be5afa33502a5c609cd8d77020f3d0a405ba4bf7a82effd0837f2e140bf",
"md5": "e78e45cfe5cf10f38c10c2f540767c96",
"sha256": "ef0bdaad438602848a671d6da6f3d1d7417d618893cb45f7f91a25825d58fee7"
},
"downloads": -1,
"filename": "seanox_ai_nlp-1.0.0-py3-none-any.whl",
"has_sig": false,
"md5_digest": "e78e45cfe5cf10f38c10c2f540767c96",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.9",
"size": 14407,
"upload_time": "2025-08-08T11:29:38",
"upload_time_iso_8601": "2025-08-08T11:29:38.517771Z",
"url": "https://files.pythonhosted.org/packages/38/16/8be5afa33502a5c609cd8d77020f3d0a405ba4bf7a82effd0837f2e140bf/seanox_ai_nlp-1.0.0-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "3b26b0def09f48bedcda76d0a1759dba378ced9c2459562f0cb935de23c8d3b9",
"md5": "15d88e3c6fdb7a8ea212d596647ce96e",
"sha256": "292d82ceb1e846a6a7dd98e360706086d17983021493c4507ea7aa29af2810b1"
},
"downloads": -1,
"filename": "seanox_ai_nlp-1.0.0.tar.gz",
"has_sig": false,
"md5_digest": "15d88e3c6fdb7a8ea212d596647ce96e",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.9",
"size": 21047,
"upload_time": "2025-08-08T11:29:39",
"upload_time_iso_8601": "2025-08-08T11:29:39.957191Z",
"url": "https://files.pythonhosted.org/packages/3b/26/b0def09f48bedcda76d0a1759dba378ced9c2459562f0cb935de23c8d3b9/seanox_ai_nlp-1.0.0.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-08-08 11:29:39",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "seanox",
"github_project": "seanox-ai-nlp",
"travis_ci": false,
"coveralls": false,
"github_actions": false,
"lcname": "seanox-ai-nlp"
}