flexi-nlp-tools


Nameflexi-nlp-tools JSON
Version 0.5.5 PyPI version JSON
download
home_pageNone
SummaryNLP toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.
upload_time2025-02-12 13:38:40
maintainerNone
docs_urlNone
authorTetiana Lytvynenko
requires_python>=3.11
licenseMIT
keywords fuzzy search flexi search nlp tools natural language processing text processing string matching phonetic matching language tools transliteration transliterator text conversion numeral converter text normalization linguistic tools rule-based transliteration
VCS
bugtrack_url
requirements No requirements were recorded.
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # flexi-nlp-tools

[![Python Versions](https://img.shields.io/badge/Python%20Versions-%3E%3D3.11-informational)](https://pypi.org/project/nlp-flexi-tools/)
[![Version](https://img.shields.io/badge/Version-0.5.5-informational)](https://pypi.org/project/nlp-flexi-tools/)

A natural language processing toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.

## Table of Contents

1. [Tools](#tools)
    - [FlexiDict](#flexidict)
    - [Numeral Converter](#numeral-converter)
    - [Lite Search](#lite-search)
    - [Lite Translit](#lite-translit)
2. [Installation](#installation)
3. [Demo](#demo)
4. [License](#license)

---

## Tools

### FlexiDict

#### Overview
**FlexiDict** is a flexible key-value storage structure where multiple values can be associated with a single key, and a single value can be referenced by multiple keys. 
Additionally, it provides robust search capabilities with error tolerance and correction for typos.

#### Initializing

##### Initializing with default settings

```python
from flexi_nlp_tools.flexi_dict import FlexiDict

flexi_dict = FlexiDict()
```

##### Initializing with custom settings

**Define a keyboard layout to calculate symbol distances for better typo handling:**

```python
from flexi_nlp_tools.flexi_dict.utils import calculate_symbols_distances

symbol_keyboard = """
1234567890-=
 qwertyuiop[]\
 asdfghjkl;'
  zxcvbnm,./"""
symbols_distances = calculate_symbols_distances(symbol_keyboards=[symbol_keyboard, ])
```
**Analyze a text corpus to compute symbol weights:**
```python
from flexi_nlp_tools.flexi_dict.utils import calculate_symbols_weights

corpus = [
    "apple red delicious",
    "apple fuji",
    "apple granny smith",
    "apple honeycrisp",
    "apple golden delicious",
    "apple pink lady"
]
symbol_weights = calculate_symbols_weights(corpus)
```

**Set custom prices for search correction to refine sorting:**
```python
from flexi_nlp_tools.flexi_dict.search_engine import SearchEngine
from flexi_nlp_tools.flexi_dict.search_engine.correction import (
  SymbolInsertion, 
  SymbolsTransposition, 
  SymbolsDeletion, 
  SymbolSubstitution
)

corrections = [
    SymbolInsertion(price=.05),
    SymbolsTransposition(price=.35),
    SymbolsDeletion(price=.4),
    SymbolSubstitution(price=.2)
]

search_engine = SearchEngine(
    symbol_weights=symbol_weights,
    symbols_distances=symbols_distances,
    corrections=corrections,
    symbol_insertion=SymbolInsertion(price=0.05)
)
```

**Initialize FlexiDict with custom settings**:
```python
flexi_dict = FlexiDict(search_engine = search_engine)
```

#### Core Functions

##### `flexi_dict.__getitem__(key)`
Retrieves the best-matching key with error and typo tolerance.

- **Parameters**:
  - `key` (*str*): The input key to search.

- **Returns**:
  - The best-matching key (*str*) if found, otherwise `None`.

- **Example**:

```python
flexi_dict["apple fuji"]
# Output: 'apple fuji'

flexi_dict["aplle fyjj"]
# Output: 'apple fuji'

flexi_dict["eplle fji"]
# Output: 'apple fuji'

flexi_dict["coffe"]
# Output: None
```

##### `flexi_dict.get(key)`
Performs a key search with error and typo tolerance.

- **Parameters**:
  - `key` (*str*): The input key to search.

- **Returns**:
  - A list of matching keys (*list[str]*) or an empty list if no matches are found.

- **Example**:

```python
flexi_dict.get("apple fuji")
# Output: ['apple fuji']

flexi_dict.get("aplle fyjj")
# Output: ['apple fuji']

flexi_dict.get("eplle fji")
# Output: ['apple fuji']

flexi_dict.get("coffe")
# Output: []
```

##### `flexi_dict.search(query)`
Finds all matching keys based on the given query, supporting partial matches and typo tolerance.

- **Parameters**:
  - `query` (*str*): The input query string to search for relevant keys.

- **Returns**:
  - A list of matching keys (*list[str]*) sorted by relevance.

- **Example**:

```python
flexi_dict.search("apple")
# Output: ['apple fuji', 'apple pink lady', 'apple honeycrisp', 'apple granny smith', 'apple red delicious', 'apple golden delicious']

flexi_dict.search("aplle")
# Output: ['apple fuji', 'apple pink lady', 'apple honeycrisp', 'apple granny smith', 'apple red delicious', 'apple golden delicious']

flexi_dict.search("apl")
# Output: ['apple fuji', 'apple pink lady', 'apple honeycrisp', 'apple granny smith', 'apple red delicious', 'apple golden delicious']

flexi_dict.search("apple hon")
# Output: ['apple honeycrisp', 'apple fuji', 'apple pink lady', 'apple granny smith', 'apple red delicious', 'apple golden delicious']
```

---

### Numeral Converter

#### Overview
**Numeral Converter** is a Python library that provides functionality to convert numbers to text and vice versa, supporting multiple languages. 
It also allows the processing of numbers in text with support for grammatical cases, gender, and pluralization. 
Additionally, it can detect and convert numbers embedded in sentences into their numerical equivalents.

---

#### Supported Languages
- **English (en)**
- **Ukrainian (uk)**
- **Russian (ru)**

---

#### Core Functions

##### `get_available_languages()`
Retrieves a list of languages supported by the numeral converter.

- **Returns**:
  - A list of language codes (e.g., `['uk', 'en', 'ru']`).

- **Example**:

```python
from flexi_nlp_tools.numeral_converter import get_available_languages

print(get_available_languages())  # Output: ['uk', 'en', 'ru']
```

---

##### `get_max_order(lang)`
Returns the maximum numerical order supported for a specific language.

- **Parameters**:
  - `lang` (*str*): The language code (e.g., `'en'`, `'uk'`, `'ru'`).

- **Returns**:
  - The maximum numerical order as an integer.

- **Example**:

```python
from flexi_nlp_tools.numeral_converter import get_max_order

print(get_max_order('en'))  # Output: 47
print(get_max_order('uk'))  # Output: 65
```

---

##### `numeral2int(numeral, lang)`
Converts a numeral in text form into its integer representation.

- **Parameters**:
  - `numeral` (*str*): The numeral string (e.g., `'one'`, `'одного'`).
  - `lang` (*str*): The language code (e.g., `'en'`, `'uk'`, `'ru'`).

- **Returns**:
  - An integer representing the value of the numeral.

- **Example**:

```python
from flexi_nlp_tools.numeral_converter import numeral2int

print(numeral2int('one', 'en'))  # Output: 1
print(numeral2int('одного', 'ru'))  # Output: 1
print(numeral2int('тисячний', 'uk'))  # Output: 1000
```

---

##### `int2numeral(value, lang, num_class=None, gender=None, case=None, number=None)`
Converts an integer into its textual representation.

- **Parameters**:
  - `value` (*int*): The numerical value to convert.
  - `lang` (*str*): The language code.
  - `num_class` (*NumClass*, optional): Specifies the numeral class (`CARDINAL` or `ORDINAL`).
  - `gender` (*Gender*, optional): Specifies the grammatical gender (`MASCULINE`, `FEMININE`, `NEUTER`).
  - `case` (*Case*, optional): Specifies the grammatical case (`NOMINATIVE`, `GENITIVE`, etc.).
  - `number` (*Number*, optional): Specifies singular or plural (`SINGULAR`, `PLURAL`).

- **Returns**:
  - A string representing the numeral in text form.

- **Example**:

```python
from flexi_nlp_tools.numeral_converter import int2numeral

print(int2numeral(
  2023,
  lang="uk",
  num_class='ORDINAL',
  number='SINGULAR')
# Output: "дві тисячі двадцять третій"
```

---

##### `convert_numerical_in_text(text, lang, **kwargs)`
Detects numbers in a string and converts them into their numerical representation.

- **Parameters**:
  - `text` (*str*): The input text containing numerical values.
  - `lang` (*str*): The language code.

- **Returns**:
  - A string with detected numbers converted to numerical form.

- **Example**:

```python
from flexi_nlp_tools.numeral_converter import convert_numerical_in_text

text = (
  "After twenty, numbers such as twenty-five and fifty follow. "
  "For example thirty-three is thirty plus three."
)
result = convert_numerical_in_text(text, lang="en")
print(result)
# Output: "After 20, numbers such as 25 and 50 follow. "
#         "For example 33 is 30 plus 3." 
```

---

### Lite Search

#### Overview
**Lite Search** designed for efficient fuzzy searching and indexing of text data. 
It enables you to build a search index from textual data and perform approximate matches on queries, 
supporting optional transliteration for non-Latin scripts. 
The library is lightweight and ideal for scenarios where quick, non-exact text matching is required.

#### Core Functions

##### `build_search_index(data, transliterate_latin=False)`
Builds a search index from a dataset.

- **Parameters**:
  - `data` (*list of tuples*): The dataset to index, where each tuple contains a unique identifier and a string value (e.g., `[(1, "text1"), (2, "text2")]`).
  - `transliterate_latin` (*bool*, optional): Enables transliteration of non-Latin scripts for better matching.

- **Returns**:
  - A search index object that can be used with `fuzzy_search`.

- **Example**:

```python
from flexi_nlp_tools.lite_search import build_search_index

data = [(1, "one"), (2, "two"), (3, "three")]
search_index = build_search_index(data)
```

##### `fuzzy_search(query, search_index, topn=None)`
Performs a fuzzy search on the given query.

- **Parameters**:
  - `query` (*str*): The search query string.
  - `search_index` (*object*): The search index generated by `build_search_index`.
  - `topn` (*int*, optional): Limits the number of results returned. If `None`, all matching results are returned.

- **Returns**:
  - A list of identifiers (from the dataset) ranked by relevance.

- **Example**:

```python
from flexi_nlp_tools.lite_search import fuzzy_search

result = fuzzy_search(query="one", search_index=search_index)
print(result)
# Output: [1]
```

##### `fuzzy_search_internal(query, search_index, topn=None)`
Returns detailed information about the matching process, including corrections applied to the query.

- **Parameters**:
  - Same as `fuzzy_search`.

- **Returns**:
  - A list of objects containing detailed matching information.

---

#### Usage Examples

##### Example 1: Basic Fuzzy Search

```python
from flexi_nlp_tools.lite_search import build_search_index, fuzzy_search

data = [(1, "one"), (2, "two"), (3, "three")]
search_index = build_search_index(data)

result = fuzzy_search(query="one", search_index=search_index)
print(result)  # Output: [1]
```

##### Example 2: Fuzzy Search with Transliteration

```python
from flexi_nlp_tools.lite_search import build_search_index, fuzzy_search

data = [(1, "ван"), (2, "ту"), (3, "срі")]
search_index = build_search_index(data, transliterate_latin=True)

result = fuzzy_search(query="ван", search_index=search_index)
print(result)  # Output: [1]
```

##### Example 3: Advanced Query Matching

```python
from flexi_nlp_tools.lite_search import build_search_index, fuzzy_search

data = [
  (1, "Burger Vegan"),
  (2, "Burger with Pork"),
  (3, "Burger with Meat and Garlic"),
]
search_index = build_search_index(data)

query = "burger"
result = fuzzy_search(query=query, search_index=search_index)
print(result)  # Output: [1, 2, 3]
```

##### Example 4: Detailed Search Results

```python
from flexi_nlp_tools.lite_search import fuzzy_search_internal

query = "bollo"
result = fuzzy_search_internal(query=query, search_index=search_index)
for match in result:
  print(match)
```

---

### Lite Translit

`lite_translit` is a lightweight rule-based transliteration tool for converting text between English, Ukrainian, and Russian. 
It approximates phonetic pronunciation, considers letter case, and adapts transliteration based on letter position in a word.


#### Core Functions

##### `en2uk_translit(text)`
Transliterates English text into Ukrainian, preserving phonetic accuracy and considering letter positions.

- **Parameters**:
  - `text` (*str*): The input English text to transliterate.

- **Returns**:
  - A Ukrainian transliterated string (*str*).

- **Example**:

```python
en2uk_translit("Tempus fugit... carpe diem!")
# Output: "Темпус фугіт... карп дєм!"

en2uk_translit("Veni, vidi, vici!")
# Output: "Вені, віді, вічі!"

en2uk_translit(
  "His conscience was clear, even as he tried to maintain the consistency of his work "
  "on the Lucene project for Samsung, while sipping a cold cola.")
# Output: "Хіс коншєнс вас кліар, евен ас хі трєд то маінтаін сі консістенкі оф хіс ворк "
#         "он сі Лусен прожечт фор Самсунг, віл сіппінг а колд кола."

```

##### `en2ru_translit(text)`
Transliterates English text into Russian, preserving phonetic accuracy.

- **Parameters**:
  - `text` (*str*): The input English text to transliterate.

- **Returns**:
  - A Russian transliterated string (*str*).

- **Example**:

```python
en2ru_translit(
  "After a long day, he enjoyed a refreshing Borjomi and Coca-Cola, "
  "feeling victorious like Vici in battle, while watching "
  "the lively citrus circus under the bright lights.")
# Output: "Афтер а лонг дай, хи енжоед а рефрешинг Боржоми анд Кока-Кола, "
#         "филинг вичторйоус лик Вичи ин баттл, вил ватчинг "
#         "си ливели китрус киркус андер си брижт лайтс."

```

##### `uk2ru_translit(text)`
Transliterates Ukrainian text into Russian while maintaining phonetic consistency.

- **Parameters**:
  - `text` (*str*): The input Ukrainian text to transliterate.

- **Returns**:
  - A Russian transliterated string (*str*).

- **Example**:

```python
uk2ru_translit("У мрії вона вирушила на подвір’я, де вітер розносив пір’їнки, і все навколо стало казкою.")
# Output: "У мрийи вона вырушыла на подвирья, де витер розносыв пирьйинкы, и все навколо стало казкою."
```

---


## Environment Variables

The following environment variables can be used to customize the behavior of the package.
Modules are validates environment variables to ensure they meet the expected constraints. 
Invalid values will raise an `InvalidEnvironmentVariable` exception. 
Default values are used when the variables are not explicitly set.

### FlexiDict environment variables
- **`DEFAULT_TOPN_LEAVES`** (default: `10`): A positive integer representing the maximum number of top leaves to retrieve in searches. Must be greater than `0`.
- **`MIN_CORRECTION_PRICE`** (default: `1e-5`): A float in the range `[0, 1]`, representing the minimum price for applying a correction.
- **`MAX_CORRECTION_RATE`** (default: `2/3`): A float in the range `[0, 1]`, representing the maximum correction rate allowed.
- **`MAX_CORRECTION_RATE_FOR_SEARCH`** (default: `1.`): A float in the range `[0, 1]`, representing the maximum correction rate allowed when adding leaves.
- **`DEFAULT_DELETION_PRICE`** (default: `0.4`): A float in the range `[0, 1]`, representing the cost of a deletion operation.
- **`DEFAULT_SUBSTITUTION_PRICE`** (default: `0.2`): A float in the range `[0, 1]`, representing the cost of a substitution operation.
- **`DEFAULT_INSERTION_PRICE`** (default: `0.05`): A float in the range `[0, 1]`, representing the cost of an insertion operation.
- **`DEFAULT_TRANSPOSITION_PRICE`** (default: `0.35`): A float in the range `[0, 1]`, representing the cost of a transposition operation.
- **`MAX_QUEUE_SIZE`** (default: `1024`): A positive integer defining the maximum queue size for processing tasks. Must be greater than `0`.

### LiteSearch environment variables
- **`MIN_START_TOKEN_LENGTH`** (default: `3`): A positive integer defining the minimum length of a starting token. Must be greater than `0`.
- **`DEFAULT_QUERY_TRANSFORMATION_PRICE`** (default: `0.4`): A float in the range `[0, ∞)`, representing the cost of a query transformation. Must be non-negative.

### NumeralConverter environment variables
- **`MAX_NUMERAL_LENGTH`** (default: `2048`): max numeral string length to process.

---

## Installation

You can easily install nlp-flexi-tools from PyPI using pip:

```bash
pip install flexi-nlp-dict
```
---

## Demo

Check out the live demo of Flexi NLP Tools here:

[Flexi NLP Tools Demo](https://flexi-nlp-tools.fly.dev/)

---

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "flexi-nlp-tools",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.11",
    "maintainer_email": null,
    "keywords": "fuzzy search, flexi search, nlp tools, natural language processing, text processing, string matching, phonetic matching, language tools, transliteration, transliterator, text conversion, numeral converter, text normalization, linguistic tools, rule-based transliteration",
    "author": "Tetiana Lytvynenko",
    "author_email": "lytvynenkotv@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/2c/0a/1a5e45ff360ec2bea8527451d02eaa7a548a97e154f200f6ef0b0969f8ff/flexi_nlp_tools-0.5.5.tar.gz",
    "platform": null,
    "description": "# flexi-nlp-tools\n\n[![Python Versions](https://img.shields.io/badge/Python%20Versions-%3E%3D3.11-informational)](https://pypi.org/project/nlp-flexi-tools/)\n[![Version](https://img.shields.io/badge/Version-0.5.5-informational)](https://pypi.org/project/nlp-flexi-tools/)\n\nA natural language processing toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.\n\n## Table of Contents\n\n1. [Tools](#tools)\n    - [FlexiDict](#flexidict)\n    - [Numeral Converter](#numeral-converter)\n    - [Lite Search](#lite-search)\n    - [Lite Translit](#lite-translit)\n2. [Installation](#installation)\n3. [Demo](#demo)\n4. [License](#license)\n\n---\n\n## Tools\n\n### FlexiDict\n\n#### Overview\n**FlexiDict** is a flexible key-value storage structure where multiple values can be associated with a single key, and a single value can be referenced by multiple keys. \nAdditionally, it provides robust search capabilities with error tolerance and correction for typos.\n\n#### Initializing\n\n##### Initializing with default settings\n\n```python\nfrom flexi_nlp_tools.flexi_dict import FlexiDict\n\nflexi_dict = FlexiDict()\n```\n\n##### Initializing with custom settings\n\n**Define a keyboard layout to calculate symbol distances for better typo handling:**\n\n```python\nfrom flexi_nlp_tools.flexi_dict.utils import calculate_symbols_distances\n\nsymbol_keyboard = \"\"\"\n1234567890-=\n qwertyuiop[]\\\n asdfghjkl;'\n  zxcvbnm,./\"\"\"\nsymbols_distances = calculate_symbols_distances(symbol_keyboards=[symbol_keyboard, ])\n```\n**Analyze a text corpus to compute symbol weights:**\n```python\nfrom flexi_nlp_tools.flexi_dict.utils import calculate_symbols_weights\n\ncorpus = [\n    \"apple red delicious\",\n    \"apple fuji\",\n    \"apple granny smith\",\n    \"apple honeycrisp\",\n    \"apple golden delicious\",\n    \"apple pink lady\"\n]\nsymbol_weights = calculate_symbols_weights(corpus)\n```\n\n**Set custom prices for search correction to refine sorting:**\n```python\nfrom flexi_nlp_tools.flexi_dict.search_engine import SearchEngine\nfrom flexi_nlp_tools.flexi_dict.search_engine.correction import (\n  SymbolInsertion, \n  SymbolsTransposition, \n  SymbolsDeletion, \n  SymbolSubstitution\n)\n\ncorrections = [\n    SymbolInsertion(price=.05),\n    SymbolsTransposition(price=.35),\n    SymbolsDeletion(price=.4),\n    SymbolSubstitution(price=.2)\n]\n\nsearch_engine = SearchEngine(\n    symbol_weights=symbol_weights,\n    symbols_distances=symbols_distances,\n    corrections=corrections,\n    symbol_insertion=SymbolInsertion(price=0.05)\n)\n```\n\n**Initialize FlexiDict with custom settings**:\n```python\nflexi_dict = FlexiDict(search_engine = search_engine)\n```\n\n#### Core Functions\n\n##### `flexi_dict.__getitem__(key)`\nRetrieves the best-matching key with error and typo tolerance.\n\n- **Parameters**:\n  - `key` (*str*): The input key to search.\n\n- **Returns**:\n  - The best-matching key (*str*) if found, otherwise `None`.\n\n- **Example**:\n\n```python\nflexi_dict[\"apple fuji\"]\n# Output: 'apple fuji'\n\nflexi_dict[\"aplle fyjj\"]\n# Output: 'apple fuji'\n\nflexi_dict[\"eplle fji\"]\n# Output: 'apple fuji'\n\nflexi_dict[\"coffe\"]\n# Output: None\n```\n\n##### `flexi_dict.get(key)`\nPerforms a key search with error and typo tolerance.\n\n- **Parameters**:\n  - `key` (*str*): The input key to search.\n\n- **Returns**:\n  - A list of matching keys (*list[str]*) or an empty list if no matches are found.\n\n- **Example**:\n\n```python\nflexi_dict.get(\"apple fuji\")\n# Output: ['apple fuji']\n\nflexi_dict.get(\"aplle fyjj\")\n# Output: ['apple fuji']\n\nflexi_dict.get(\"eplle fji\")\n# Output: ['apple fuji']\n\nflexi_dict.get(\"coffe\")\n# Output: []\n```\n\n##### `flexi_dict.search(query)`\nFinds all matching keys based on the given query, supporting partial matches and typo tolerance.\n\n- **Parameters**:\n  - `query` (*str*): The input query string to search for relevant keys.\n\n- **Returns**:\n  - A list of matching keys (*list[str]*) sorted by relevance.\n\n- **Example**:\n\n```python\nflexi_dict.search(\"apple\")\n# Output: ['apple fuji', 'apple pink lady', 'apple honeycrisp', 'apple granny smith', 'apple red delicious', 'apple golden delicious']\n\nflexi_dict.search(\"aplle\")\n# Output: ['apple fuji', 'apple pink lady', 'apple honeycrisp', 'apple granny smith', 'apple red delicious', 'apple golden delicious']\n\nflexi_dict.search(\"apl\")\n# Output: ['apple fuji', 'apple pink lady', 'apple honeycrisp', 'apple granny smith', 'apple red delicious', 'apple golden delicious']\n\nflexi_dict.search(\"apple hon\")\n# Output: ['apple honeycrisp', 'apple fuji', 'apple pink lady', 'apple granny smith', 'apple red delicious', 'apple golden delicious']\n```\n\n---\n\n### Numeral Converter\n\n#### Overview\n**Numeral Converter** is a Python library that provides functionality to convert numbers to text and vice versa, supporting multiple languages. \nIt also allows the processing of numbers in text with support for grammatical cases, gender, and pluralization. \nAdditionally, it can detect and convert numbers embedded in sentences into their numerical equivalents.\n\n---\n\n#### Supported Languages\n- **English (en)**\n- **Ukrainian (uk)**\n- **Russian (ru)**\n\n---\n\n#### Core Functions\n\n##### `get_available_languages()`\nRetrieves a list of languages supported by the numeral converter.\n\n- **Returns**:\n  - A list of language codes (e.g., `['uk', 'en', 'ru']`).\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.numeral_converter import get_available_languages\n\nprint(get_available_languages())  # Output: ['uk', 'en', 'ru']\n```\n\n---\n\n##### `get_max_order(lang)`\nReturns the maximum numerical order supported for a specific language.\n\n- **Parameters**:\n  - `lang` (*str*): The language code (e.g., `'en'`, `'uk'`, `'ru'`).\n\n- **Returns**:\n  - The maximum numerical order as an integer.\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.numeral_converter import get_max_order\n\nprint(get_max_order('en'))  # Output: 47\nprint(get_max_order('uk'))  # Output: 65\n```\n\n---\n\n##### `numeral2int(numeral, lang)`\nConverts a numeral in text form into its integer representation.\n\n- **Parameters**:\n  - `numeral` (*str*): The numeral string (e.g., `'one'`, `'\u043e\u0434\u043d\u043e\u0433\u043e'`).\n  - `lang` (*str*): The language code (e.g., `'en'`, `'uk'`, `'ru'`).\n\n- **Returns**:\n  - An integer representing the value of the numeral.\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.numeral_converter import numeral2int\n\nprint(numeral2int('one', 'en'))  # Output: 1\nprint(numeral2int('\u043e\u0434\u043d\u043e\u0433\u043e', 'ru'))  # Output: 1\nprint(numeral2int('\u0442\u0438\u0441\u044f\u0447\u043d\u0438\u0439', 'uk'))  # Output: 1000\n```\n\n---\n\n##### `int2numeral(value, lang, num_class=None, gender=None, case=None, number=None)`\nConverts an integer into its textual representation.\n\n- **Parameters**:\n  - `value` (*int*): The numerical value to convert.\n  - `lang` (*str*): The language code.\n  - `num_class` (*NumClass*, optional): Specifies the numeral class (`CARDINAL` or `ORDINAL`).\n  - `gender` (*Gender*, optional): Specifies the grammatical gender (`MASCULINE`, `FEMININE`, `NEUTER`).\n  - `case` (*Case*, optional): Specifies the grammatical case (`NOMINATIVE`, `GENITIVE`, etc.).\n  - `number` (*Number*, optional): Specifies singular or plural (`SINGULAR`, `PLURAL`).\n\n- **Returns**:\n  - A string representing the numeral in text form.\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.numeral_converter import int2numeral\n\nprint(int2numeral(\n  2023,\n  lang=\"uk\",\n  num_class='ORDINAL',\n  number='SINGULAR')\n# Output: \"\u0434\u0432\u0456 \u0442\u0438\u0441\u044f\u0447\u0456 \u0434\u0432\u0430\u0434\u0446\u044f\u0442\u044c \u0442\u0440\u0435\u0442\u0456\u0439\"\n```\n\n---\n\n##### `convert_numerical_in_text(text, lang, **kwargs)`\nDetects numbers in a string and converts them into their numerical representation.\n\n- **Parameters**:\n  - `text` (*str*): The input text containing numerical values.\n  - `lang` (*str*): The language code.\n\n- **Returns**:\n  - A string with detected numbers converted to numerical form.\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.numeral_converter import convert_numerical_in_text\n\ntext = (\n  \"After twenty, numbers such as twenty-five and fifty follow. \"\n  \"For example thirty-three is thirty plus three.\"\n)\nresult = convert_numerical_in_text(text, lang=\"en\")\nprint(result)\n# Output: \"After 20, numbers such as 25 and 50 follow. \"\n#         \"For example 33 is 30 plus 3.\" \n```\n\n---\n\n### Lite Search\n\n#### Overview\n**Lite Search** designed for efficient fuzzy searching and indexing of text data. \nIt enables you to build a search index from textual data and perform approximate matches on queries, \nsupporting optional transliteration for non-Latin scripts. \nThe library is lightweight and ideal for scenarios where quick, non-exact text matching is required.\n\n#### Core Functions\n\n##### `build_search_index(data, transliterate_latin=False)`\nBuilds a search index from a dataset.\n\n- **Parameters**:\n  - `data` (*list of tuples*): The dataset to index, where each tuple contains a unique identifier and a string value (e.g., `[(1, \"text1\"), (2, \"text2\")]`).\n  - `transliterate_latin` (*bool*, optional): Enables transliteration of non-Latin scripts for better matching.\n\n- **Returns**:\n  - A search index object that can be used with `fuzzy_search`.\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.lite_search import build_search_index\n\ndata = [(1, \"one\"), (2, \"two\"), (3, \"three\")]\nsearch_index = build_search_index(data)\n```\n\n##### `fuzzy_search(query, search_index, topn=None)`\nPerforms a fuzzy search on the given query.\n\n- **Parameters**:\n  - `query` (*str*): The search query string.\n  - `search_index` (*object*): The search index generated by `build_search_index`.\n  - `topn` (*int*, optional): Limits the number of results returned. If `None`, all matching results are returned.\n\n- **Returns**:\n  - A list of identifiers (from the dataset) ranked by relevance.\n\n- **Example**:\n\n```python\nfrom flexi_nlp_tools.lite_search import fuzzy_search\n\nresult = fuzzy_search(query=\"one\", search_index=search_index)\nprint(result)\n# Output: [1]\n```\n\n##### `fuzzy_search_internal(query, search_index, topn=None)`\nReturns detailed information about the matching process, including corrections applied to the query.\n\n- **Parameters**:\n  - Same as `fuzzy_search`.\n\n- **Returns**:\n  - A list of objects containing detailed matching information.\n\n---\n\n#### Usage Examples\n\n##### Example 1: Basic Fuzzy Search\n\n```python\nfrom flexi_nlp_tools.lite_search import build_search_index, fuzzy_search\n\ndata = [(1, \"one\"), (2, \"two\"), (3, \"three\")]\nsearch_index = build_search_index(data)\n\nresult = fuzzy_search(query=\"one\", search_index=search_index)\nprint(result)  # Output: [1]\n```\n\n##### Example 2: Fuzzy Search with Transliteration\n\n```python\nfrom flexi_nlp_tools.lite_search import build_search_index, fuzzy_search\n\ndata = [(1, \"\u0432\u0430\u043d\"), (2, \"\u0442\u0443\"), (3, \"\u0441\u0440\u0456\")]\nsearch_index = build_search_index(data, transliterate_latin=True)\n\nresult = fuzzy_search(query=\"\u0432\u0430\u043d\", search_index=search_index)\nprint(result)  # Output: [1]\n```\n\n##### Example 3: Advanced Query Matching\n\n```python\nfrom flexi_nlp_tools.lite_search import build_search_index, fuzzy_search\n\ndata = [\n  (1, \"Burger Vegan\"),\n  (2, \"Burger with Pork\"),\n  (3, \"Burger with Meat and Garlic\"),\n]\nsearch_index = build_search_index(data)\n\nquery = \"burger\"\nresult = fuzzy_search(query=query, search_index=search_index)\nprint(result)  # Output: [1, 2, 3]\n```\n\n##### Example 4: Detailed Search Results\n\n```python\nfrom flexi_nlp_tools.lite_search import fuzzy_search_internal\n\nquery = \"bollo\"\nresult = fuzzy_search_internal(query=query, search_index=search_index)\nfor match in result:\n  print(match)\n```\n\n---\n\n### Lite Translit\n\n`lite_translit` is a lightweight rule-based transliteration tool for converting text between English, Ukrainian, and Russian. \nIt approximates phonetic pronunciation, considers letter case, and adapts transliteration based on letter position in a word.\n\n\n#### Core Functions\n\n##### `en2uk_translit(text)`\nTransliterates English text into Ukrainian, preserving phonetic accuracy and considering letter positions.\n\n- **Parameters**:\n  - `text` (*str*): The input English text to transliterate.\n\n- **Returns**:\n  - A Ukrainian transliterated string (*str*).\n\n- **Example**:\n\n```python\nen2uk_translit(\"Tempus fugit... carpe diem!\")\n# Output: \"\u0422\u0435\u043c\u043f\u0443\u0441 \u0444\u0443\u0433\u0456\u0442... \u043a\u0430\u0440\u043f \u0434\u0454\u043c!\"\n\nen2uk_translit(\"Veni, vidi, vici!\")\n# Output: \"\u0412\u0435\u043d\u0456, \u0432\u0456\u0434\u0456, \u0432\u0456\u0447\u0456!\"\n\nen2uk_translit(\n  \"His conscience was clear, even as he tried to maintain the consistency of his work \"\n  \"on the Lucene project for Samsung, while sipping a cold cola.\")\n# Output: \"\u0425\u0456\u0441 \u043a\u043e\u043d\u0448\u0454\u043d\u0441 \u0432\u0430\u0441 \u043a\u043b\u0456\u0430\u0440, \u0435\u0432\u0435\u043d \u0430\u0441 \u0445\u0456 \u0442\u0440\u0454\u0434 \u0442\u043e \u043c\u0430\u0456\u043d\u0442\u0430\u0456\u043d \u0441\u0456 \u043a\u043e\u043d\u0441\u0456\u0441\u0442\u0435\u043d\u043a\u0456 \u043e\u0444 \u0445\u0456\u0441 \u0432\u043e\u0440\u043a \"\n#         \"\u043e\u043d \u0441\u0456 \u041b\u0443\u0441\u0435\u043d \u043f\u0440\u043e\u0436\u0435\u0447\u0442 \u0444\u043e\u0440 \u0421\u0430\u043c\u0441\u0443\u043d\u0433, \u0432\u0456\u043b \u0441\u0456\u043f\u043f\u0456\u043d\u0433 \u0430 \u043a\u043e\u043b\u0434 \u043a\u043e\u043b\u0430.\"\n\n```\n\n##### `en2ru_translit(text)`\nTransliterates English text into Russian, preserving phonetic accuracy.\n\n- **Parameters**:\n  - `text` (*str*): The input English text to transliterate.\n\n- **Returns**:\n  - A Russian transliterated string (*str*).\n\n- **Example**:\n\n```python\nen2ru_translit(\n  \"After a long day, he enjoyed a refreshing Borjomi and Coca-Cola, \"\n  \"feeling victorious like Vici in battle, while watching \"\n  \"the lively citrus circus under the bright lights.\")\n# Output: \"\u0410\u0444\u0442\u0435\u0440 \u0430 \u043b\u043e\u043d\u0433 \u0434\u0430\u0439, \u0445\u0438 \u0435\u043d\u0436\u043e\u0435\u0434 \u0430 \u0440\u0435\u0444\u0440\u0435\u0448\u0438\u043d\u0433 \u0411\u043e\u0440\u0436\u043e\u043c\u0438 \u0430\u043d\u0434 \u041a\u043e\u043a\u0430-\u041a\u043e\u043b\u0430, \"\n#         \"\u0444\u0438\u043b\u0438\u043d\u0433 \u0432\u0438\u0447\u0442\u043e\u0440\u0439\u043e\u0443\u0441 \u043b\u0438\u043a \u0412\u0438\u0447\u0438 \u0438\u043d \u0431\u0430\u0442\u0442\u043b, \u0432\u0438\u043b \u0432\u0430\u0442\u0447\u0438\u043d\u0433 \"\n#         \"\u0441\u0438 \u043b\u0438\u0432\u0435\u043b\u0438 \u043a\u0438\u0442\u0440\u0443\u0441 \u043a\u0438\u0440\u043a\u0443\u0441 \u0430\u043d\u0434\u0435\u0440 \u0441\u0438 \u0431\u0440\u0438\u0436\u0442 \u043b\u0430\u0439\u0442\u0441.\"\n\n```\n\n##### `uk2ru_translit(text)`\nTransliterates Ukrainian text into Russian while maintaining phonetic consistency.\n\n- **Parameters**:\n  - `text` (*str*): The input Ukrainian text to transliterate.\n\n- **Returns**:\n  - A Russian transliterated string (*str*).\n\n- **Example**:\n\n```python\nuk2ru_translit(\"\u0423 \u043c\u0440\u0456\u0457 \u0432\u043e\u043d\u0430 \u0432\u0438\u0440\u0443\u0448\u0438\u043b\u0430 \u043d\u0430 \u043f\u043e\u0434\u0432\u0456\u0440\u2019\u044f, \u0434\u0435 \u0432\u0456\u0442\u0435\u0440 \u0440\u043e\u0437\u043d\u043e\u0441\u0438\u0432 \u043f\u0456\u0440\u2019\u0457\u043d\u043a\u0438, \u0456 \u0432\u0441\u0435 \u043d\u0430\u0432\u043a\u043e\u043b\u043e \u0441\u0442\u0430\u043b\u043e \u043a\u0430\u0437\u043a\u043e\u044e.\")\n# Output: \"\u0423 \u043c\u0440\u0438\u0439\u0438 \u0432\u043e\u043d\u0430 \u0432\u044b\u0440\u0443\u0448\u044b\u043b\u0430 \u043d\u0430 \u043f\u043e\u0434\u0432\u0438\u0440\u044c\u044f, \u0434\u0435 \u0432\u0438\u0442\u0435\u0440 \u0440\u043e\u0437\u043d\u043e\u0441\u044b\u0432 \u043f\u0438\u0440\u044c\u0439\u0438\u043d\u043a\u044b, \u0438 \u0432\u0441\u0435 \u043d\u0430\u0432\u043a\u043e\u043b\u043e \u0441\u0442\u0430\u043b\u043e \u043a\u0430\u0437\u043a\u043e\u044e.\"\n```\n\n---\n\n\n## Environment Variables\n\nThe following environment variables can be used to customize the behavior of the package.\nModules are validates environment variables to ensure they meet the expected constraints. \nInvalid values will raise an `InvalidEnvironmentVariable` exception. \nDefault values are used when the variables are not explicitly set.\n\n### FlexiDict environment variables\n- **`DEFAULT_TOPN_LEAVES`** (default: `10`): A positive integer representing the maximum number of top leaves to retrieve in searches. Must be greater than `0`.\n- **`MIN_CORRECTION_PRICE`** (default: `1e-5`): A float in the range `[0, 1]`, representing the minimum price for applying a correction.\n- **`MAX_CORRECTION_RATE`** (default: `2/3`): A float in the range `[0, 1]`, representing the maximum correction rate allowed.\n- **`MAX_CORRECTION_RATE_FOR_SEARCH`** (default: `1.`): A float in the range `[0, 1]`, representing the maximum correction rate allowed when adding leaves.\n- **`DEFAULT_DELETION_PRICE`** (default: `0.4`): A float in the range `[0, 1]`, representing the cost of a deletion operation.\n- **`DEFAULT_SUBSTITUTION_PRICE`** (default: `0.2`): A float in the range `[0, 1]`, representing the cost of a substitution operation.\n- **`DEFAULT_INSERTION_PRICE`** (default: `0.05`): A float in the range `[0, 1]`, representing the cost of an insertion operation.\n- **`DEFAULT_TRANSPOSITION_PRICE`** (default: `0.35`): A float in the range `[0, 1]`, representing the cost of a transposition operation.\n- **`MAX_QUEUE_SIZE`** (default: `1024`): A positive integer defining the maximum queue size for processing tasks. Must be greater than `0`.\n\n### LiteSearch environment variables\n- **`MIN_START_TOKEN_LENGTH`** (default: `3`): A positive integer defining the minimum length of a starting token. Must be greater than `0`.\n- **`DEFAULT_QUERY_TRANSFORMATION_PRICE`** (default: `0.4`): A float in the range `[0, \u221e)`, representing the cost of a query transformation. Must be non-negative.\n\n### NumeralConverter environment variables\n- **`MAX_NUMERAL_LENGTH`** (default: `2048`): max numeral string length to process.\n\n---\n\n## Installation\n\nYou can easily install nlp-flexi-tools from PyPI using pip:\n\n```bash\npip install flexi-nlp-dict\n```\n---\n\n## Demo\n\nCheck out the live demo of Flexi NLP Tools here:\n\n[Flexi NLP Tools Demo](https://flexi-nlp-tools.fly.dev/)\n\n---\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "NLP toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a focus on simplicity, performance, and flexibility.",
    "version": "0.5.5",
    "project_urls": null,
    "split_keywords": [
        "fuzzy search",
        " flexi search",
        " nlp tools",
        " natural language processing",
        " text processing",
        " string matching",
        " phonetic matching",
        " language tools",
        " transliteration",
        " transliterator",
        " text conversion",
        " numeral converter",
        " text normalization",
        " linguistic tools",
        " rule-based transliteration"
    ],
    "urls": [
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "0a912d31d0f6e262028cf3c3289bc76afaedd42a8b5c2c2e21a28a6cc076f493",
                "md5": "c1955a1ee2d36ad373c37e63013b37d0",
                "sha256": "eb518714ce51cd3b5607f3e4d0640070c7cfc2044a75bb1226acc9cdae79c4c1"
            },
            "downloads": -1,
            "filename": "flexi_nlp_tools-0.5.5-py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "c1955a1ee2d36ad373c37e63013b37d0",
            "packagetype": "bdist_wheel",
            "python_version": "py3",
            "requires_python": ">=3.11",
            "size": 73452,
            "upload_time": "2025-02-12T13:38:38",
            "upload_time_iso_8601": "2025-02-12T13:38:38.917345Z",
            "url": "https://files.pythonhosted.org/packages/0a/91/2d31d0f6e262028cf3c3289bc76afaedd42a8b5c2c2e21a28a6cc076f493/flexi_nlp_tools-0.5.5-py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": null,
            "digests": {
                "blake2b_256": "2c0a1a5e45ff360ec2bea8527451d02eaa7a548a97e154f200f6ef0b0969f8ff",
                "md5": "fd1596b67fb85bd777a2dce469bb9828",
                "sha256": "3f9d7a804a5dbb68af30ed3034084a55c82442c6ef05bdae780d0f60b5f8c121"
            },
            "downloads": -1,
            "filename": "flexi_nlp_tools-0.5.5.tar.gz",
            "has_sig": false,
            "md5_digest": "fd1596b67fb85bd777a2dce469bb9828",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.11",
            "size": 66159,
            "upload_time": "2025-02-12T13:38:40",
            "upload_time_iso_8601": "2025-02-12T13:38:40.389490Z",
            "url": "https://files.pythonhosted.org/packages/2c/0a/1a5e45ff360ec2bea8527451d02eaa7a548a97e154f200f6ef0b0969f8ff/flexi_nlp_tools-0.5.5.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2025-02-12 13:38:40",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "flexi-nlp-tools"
}
        
Elapsed time: 1.31260s