langvec

Name	langvec JSON
Version	0.0.2 JSON
	download
home_page	None
Summary	Language of Vectors (LangVec) is a simple Python library designed for transforming numerical vector data into a language-like structure using a predefined set of words (lexicon).
upload_time	2024-04-05 10:17:39
maintainer	None
docs_url	None
author	Simeon Emanuilov
requires_python	>=3.00
license	MIT
keywords	langvec semantic search vectorization
VCS
bugtrack_url
requirements	No requirements were recorded.
Travis-CI	No Travis.
coveralls test coverage	No coveralls.

            <p align="center">
  <img src="assets/logo.png" alt="LangVec Logo" width="150">
</p>

<p align="center">
  <i>Language of Vectors (LangVec) is a simple Python library designed for transforming numerical vector data into a language-like structure using a predefined set of words (lexicon).</i>
</p>

## Approach

`LangVec` package leverages the concept of percentile-based mapping to assign words from a lexicon to numerical values,
facilitating intuitive and human-readable representations of numerical data.

<p align="center">
  <img src="assets/langvec-schema.jpg" alt="LangVec Simplified schema" title="Simplified schema" width="900">
  <i>Simplified schema of how LangVec works</i>
</p>

## Where to use LangVec

The main application is in semantic search and similarity-based systems, where understanding the proximity between
vectors is crucial.  
By transforming complex numerical vectors into a lexicon-based representation, `LangVec` facilitates an intuitive
understanding of these similarities for humans.

In fields like machine learning and natural language processing, `LangVec` can assist in tasks such as clustering or
categorizing data, where a human-readable format is preferable for quick insights and decision-making.

## Installation

```bash
pip install langvec
```

## Usage

### Example 1

```python
import numpy as np

from langvec import LangVec

# Random seed
np.random.seed(42)

# Initialize LangVec
lv = LangVec()
NUM_VECTORS = 1000
DIMENSIONS = 10

# Generate some random data
vectors = [np.random.uniform(0, 1, DIMENSIONS) for _ in range(NUM_VECTORS)]

# Fit to this data (getting know to distribution)
lv.fit(vectors)

# Save current model
lv.save("model.lv")

# Example vector for prediction
input_vector = np.random.uniform(0, 1, DIMENSIONS)

# Make prediction on unseen vector embedding
print(lv.predict(input_vector))
```

### Example 2

```python
import string

import numpy as np

from langvec import LangVec

np.random.seed(42)

# Define a new lexicon with lowercase and uppercase letters
LEXICON = list(string.ascii_letters)

# Initialize LangVec with the new lexicon
lv = LangVec(lexicon=LEXICON)

NUM_VECTORS = 10000
DIMENSIONS = 256

# Generate some random data
vectors = [np.random.uniform(0, 1, DIMENSIONS) for _ in range(NUM_VECTORS)]

# Fit to this data
lv.fit(vectors)

# Example vector for prediction
input_vector = np.random.uniform(0, 1, DIMENSIONS)

# Make prediction on the unseen vector embedding
predicted_string = "".join(lv.predict(input_vector))
print(predicted_string)
if len(predicted_string) > 6:
    summarized_string = (
        "".join(predicted_string[:3]) + "..." + "".join(predicted_string[-3:])
    )
else:
    summarized_string = "".join(predicted_string)

print(summarized_string)
```

## Save and load model from disk

LangVec allows you to save and load percentiles as model artifacts. This is useful for preserving the learned
distribution without needing to retrain the model. You can use the following methods:

#### Save model

```python
from langvec import LangVec

# Initialize LangVec
lv = LangVec()

# Save the model to file
lv.save("model.lv")
```

#### Load model

```python
from langvec import LangVec

# Initialize LangVec
lv = LangVec()

# Load the model from file
lv.load("model.lv")
```

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "langvec",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.00",
    "maintainer_email": null,
    "keywords": "langvec semantic search vectorization",
    "author": "Simeon Emanuilov",
    "author_email": "simeon.emanuilov@gmail.com",
    "download_url": "https://files.pythonhosted.org/packages/7c/16/c71c0c2d11c85f52414cfb639dcc761aad0971b12770e913cc399854d58f/langvec-0.0.2.tar.gz",
    "platform": null,
    "description": "<p align=\"center\">\n  <img src=\"assets/logo.png\" alt=\"LangVec Logo\" width=\"150\">\n</p>\n\n<p align=\"center\">\n  <i>Language of Vectors (LangVec) is a simple Python library designed for transforming numerical vector data into a language-like structure using a predefined set of words (lexicon).</i>\n</p>\n\n## Approach\n\n`LangVec` package leverages the concept of percentile-based mapping to assign words from a lexicon to numerical values,\nfacilitating intuitive and human-readable representations of numerical data.\n\n<p align=\"center\">\n  <img src=\"assets/langvec-schema.jpg\" alt=\"LangVec Simplified schema\" title=\"Simplified schema\" width=\"900\">\n  <i>Simplified schema of how LangVec works</i>\n</p>\n\n## Where to use LangVec\n\nThe main application is in semantic search and similarity-based systems, where understanding the proximity between\nvectors is crucial.  \nBy transforming complex numerical vectors into a lexicon-based representation, `LangVec` facilitates an intuitive\nunderstanding of these similarities for humans.\n\nIn fields like machine learning and natural language processing, `LangVec` can assist in tasks such as clustering or\ncategorizing data, where a human-readable format is preferable for quick insights and decision-making.\n\n## Installation\n\n```bash\npip install langvec\n```\n\n## Usage\n\n### Example 1\n\n```python\nimport numpy as np\n\nfrom langvec import LangVec\n\n# Random seed\nnp.random.seed(42)\n\n# Initialize LangVec\nlv = LangVec()\nNUM_VECTORS = 1000\nDIMENSIONS = 10\n\n# Generate some random data\nvectors = [np.random.uniform(0, 1, DIMENSIONS) for _ in range(NUM_VECTORS)]\n\n# Fit to this data (getting know to distribution)\nlv.fit(vectors)\n\n# Save current model\nlv.save(\"model.lv\")\n\n# Example vector for prediction\ninput_vector = np.random.uniform(0, 1, DIMENSIONS)\n\n# Make prediction on unseen vector embedding\nprint(lv.predict(input_vector))\n```\n\n### Example 2\n\n```python\nimport string\n\nimport numpy as np\n\nfrom langvec import LangVec\n\nnp.random.seed(42)\n\n# Define a new lexicon with lowercase and uppercase letters\nLEXICON = list(string.ascii_letters)\n\n# Initialize LangVec with the new lexicon\nlv = LangVec(lexicon=LEXICON)\n\nNUM_VECTORS = 10000\nDIMENSIONS = 256\n\n# Generate some random data\nvectors = [np.random.uniform(0, 1, DIMENSIONS) for _ in range(NUM_VECTORS)]\n\n# Fit to this data\nlv.fit(vectors)\n\n# Example vector for prediction\ninput_vector = np.random.uniform(0, 1, DIMENSIONS)\n\n# Make prediction on the unseen vector embedding\npredicted_string = \"\".join(lv.predict(input_vector))\nprint(predicted_string)\nif len(predicted_string) > 6:\n    summarized_string = (\n        \"\".join(predicted_string[:3]) + \"...\" + \"\".join(predicted_string[-3:])\n    )\nelse:\n    summarized_string = \"\".join(predicted_string)\n\nprint(summarized_string)\n```\n\n## Save and load model from disk\n\nLangVec allows you to save and load percentiles as model artifacts. This is useful for preserving the learned\ndistribution without needing to retrain the model. You can use the following methods:\n\n#### Save model\n\n```python\nfrom langvec import LangVec\n\n# Initialize LangVec\nlv = LangVec()\n\n# Save the model to file\nlv.save(\"model.lv\")\n```\n\n#### Load model\n\n```python\nfrom langvec import LangVec\n\n# Initialize LangVec\nlv = LangVec()\n\n# Load the model from file\nlv.load(\"model.lv\")\n```\n",
    "bugtrack_url": null,
    "license": "MIT",
    "summary": "Language of Vectors (LangVec) is a simple Python library designed for transforming numerical vector data into a language-like structure using a predefined set of words (lexicon).",
    "version": "0.0.2",
    "project_urls": null,
    "split_keywords": [
        "langvec",
        "semantic",
        "search",
        "vectorization"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "7c16c71c0c2d11c85f52414cfb639dcc761aad0971b12770e913cc399854d58f",
                "md5": "ee61dc821674ba73f852f4e4472a8720",
                "sha256": "bbb34ea7c4d8a0944d1945ada4d97be9a27a885ad3dd2125817b34e1092300e5"
            },
            "downloads": -1,
            "filename": "langvec-0.0.2.tar.gz",
            "has_sig": false,
            "md5_digest": "ee61dc821674ba73f852f4e4472a8720",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.00",
            "size": 7123,
            "upload_time": "2024-04-05T10:17:39",
            "upload_time_iso_8601": "2024-04-05T10:17:39.041667Z",
            "url": "https://files.pythonhosted.org/packages/7c/16/c71c0c2d11c85f52414cfb639dcc761aad0971b12770e913cc399854d58f/langvec-0.0.2.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-04-05 10:17:39",
    "github": false,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "lcname": "langvec"
}

Simeon Emanuilov