mlx-embeddings


Namemlx-embeddings JSON
Version 0.0.1 PyPI version JSON
download
home_pageNone
SummaryMLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.
upload_time2024-08-17 18:33:52
maintainerNone
docs_urlNone
authorNone
requires_python>=3.8
licenseGNU General Public License v3
keywords mlx-embeddings
VCS
bugtrack_url
requirements mlx transformers
Travis-CI No Travis.
coveralls test coverage No coveralls.
            # MLX-Embeddings

[![image](https://img.shields.io/pypi/v/mlx-embeddings.svg)](https://pypi.python.org/pypi/mlx-embeddings)

**MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.**

- Free software: GNU General Public License v3

## Features

- Generate embeddings for text using MLX models
- Support for single-item and batch processing
- Utilities for comparing text similarities

## Installation

You can install mlx-embeddings using pip:

```bash
pip install mlx-embeddings
```

## Usage

### Single Item Embedding

To generate an embedding for a single piece of text:

```python
import mlx.core as mx
from mlx_embeddings.utils import load

# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")

# Prepare the text
text = "I like reading"

# Tokenize and generate embedding
input_ids = tokenizer.encode(text, return_tensors="mlx")
outputs = model(input_ids)
embeddings = outputs[0][:, 0, :]
```

### Comparing Multiple Texts

To compare multiple texts using their embeddings:

```python
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import mlx.core as mx
from mlx_embeddings.utils import load

# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")

def get_embedding(text, model, tokenizer):
    input_ids = tokenizer.encode(text, return_tensors="mlx", padding=True, truncation=True, max_length=512)
    outputs = model(input_ids)
    embeddings = outputs[0][:, 0, :][0]
    return embeddings

# Sample texts
texts = [
    "I like grapes",
    "I like fruits",
    "The slow green turtle crawls under the busy ant."
]

# Generate embeddings
embeddings = [get_embedding(text, model, tokenizer) for text in texts]

# Compute similarity
similarity_matrix = cosine_similarity(embeddings)

# Visualize results
def plot_similarity_matrix(similarity_matrix, labels):
    plt.figure(figsize=(5, 4))
    sns.heatmap(similarity_matrix, annot=True, cmap='coolwarm', xticklabels=labels, yticklabels=labels)
    plt.title('Similarity Matrix Heatmap')
    plt.tight_layout()
    plt.show()

labels = [f"Text {i+1}" for i in range(len(texts))]
plot_similarity_matrix(similarity_matrix, labels)
```

### Batch Processing

For processing multiple texts at once:

```python
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import mlx.core as mx
from mlx_embeddings.utils import load

# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")

def get_embedding(texts, model, tokenizer):
    inputs = tokenizer.batch_encode_plus(texts, return_tensors="mlx", padding=True, truncation=True, max_length=512)
    outputs = model(
        inputs["input_ids"],
        attention_mask=inputs["attention_mask"]
    )
    return outputs[0]

def compute_and_print_similarity(embeddings):
    B, Seq_len, dim = embeddings.shape
    embeddings_2d = embeddings.reshape(B, -1)
    similarity_matrix = cosine_similarity(embeddings_2d)

    print("Similarity matrix between sequences:")
    print(similarity_matrix)
    print("\n")

    for i in range(B):
        for j in range(i+1, B):
            print(f"Similarity between sequence {i+1} and sequence {j+1}: {similarity_matrix[i][j]:.4f}")

    return similarity_matrix

# Sample texts
texts = [
    "I like grapes",
    "I like fruits",
    "The slow green turtle crawls under the busy ant."
]

embeddings = get_embedding(texts, model, tokenizer)
similarity_matrix = compute_and_print_similarity(embeddings)

# Visualize results
labels = [f"Text {i+1}" for i in range(len(texts))]
plot_similarity_matrix(similarity_matrix, labels)
```

## Supported Models Archictectures
MLX-Embeddings supports a variety of model architectures for text embedding tasks. Here's a breakdown of the currently supported architectures:
- XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)
- BERT (Bidirectional Encoder Representations from Transformers)

We're continuously working to expand our support for additional model architectures. Check our GitHub repository or documentation for the most up-to-date list of supported models and their specific versions.

## Contributing

Contributions to MLX-Embeddings are welcome! Please refer to our contribution guidelines for more information.

## License

This project is licensed under the GNU General Public License v3.

## Contact

For any questions or issues, please open an issue on the [GitHub repository](https://github.com/Blaizzy/mlx-embeddings).

            

Raw data

            {
    "_id": null,
    "home_page": null,
    "name": "mlx-embeddings",
    "maintainer": null,
    "docs_url": null,
    "requires_python": ">=3.8",
    "maintainer_email": null,
    "keywords": "mlx-embeddings",
    "author": null,
    "author_email": "Prince Canuma <prince.gdt@gmail.com>",
    "download_url": "https://files.pythonhosted.org/packages/45/76/4eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b/mlx_embeddings-0.0.1.tar.gz",
    "platform": null,
    "description": "# MLX-Embeddings\n\n[![image](https://img.shields.io/pypi/v/mlx-embeddings.svg)](https://pypi.python.org/pypi/mlx-embeddings)\n\n**MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.**\n\n- Free software: GNU General Public License v3\n\n## Features\n\n- Generate embeddings for text using MLX models\n- Support for single-item and batch processing\n- Utilities for comparing text similarities\n\n## Installation\n\nYou can install mlx-embeddings using pip:\n\n```bash\npip install mlx-embeddings\n```\n\n## Usage\n\n### Single Item Embedding\n\nTo generate an embedding for a single piece of text:\n\n```python\nimport mlx.core as mx\nfrom mlx_embeddings.utils import load\n\n# Load the model and tokenizer\nmodel, tokenizer = load(\"sentence-transformers/all-MiniLM-L6-v2\")\n\n# Prepare the text\ntext = \"I like reading\"\n\n# Tokenize and generate embedding\ninput_ids = tokenizer.encode(text, return_tensors=\"mlx\")\noutputs = model(input_ids)\nembeddings = outputs[0][:, 0, :]\n```\n\n### Comparing Multiple Texts\n\nTo compare multiple texts using their embeddings:\n\n```python\nfrom sklearn.metrics.pairwise import cosine_similarity\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport mlx.core as mx\nfrom mlx_embeddings.utils import load\n\n# Load the model and tokenizer\nmodel, tokenizer = load(\"sentence-transformers/all-MiniLM-L6-v2\")\n\ndef get_embedding(text, model, tokenizer):\n    input_ids = tokenizer.encode(text, return_tensors=\"mlx\", padding=True, truncation=True, max_length=512)\n    outputs = model(input_ids)\n    embeddings = outputs[0][:, 0, :][0]\n    return embeddings\n\n# Sample texts\ntexts = [\n    \"I like grapes\",\n    \"I like fruits\",\n    \"The slow green turtle crawls under the busy ant.\"\n]\n\n# Generate embeddings\nembeddings = [get_embedding(text, model, tokenizer) for text in texts]\n\n# Compute similarity\nsimilarity_matrix = cosine_similarity(embeddings)\n\n# Visualize results\ndef plot_similarity_matrix(similarity_matrix, labels):\n    plt.figure(figsize=(5, 4))\n    sns.heatmap(similarity_matrix, annot=True, cmap='coolwarm', xticklabels=labels, yticklabels=labels)\n    plt.title('Similarity Matrix Heatmap')\n    plt.tight_layout()\n    plt.show()\n\nlabels = [f\"Text {i+1}\" for i in range(len(texts))]\nplot_similarity_matrix(similarity_matrix, labels)\n```\n\n### Batch Processing\n\nFor processing multiple texts at once:\n\n```python\nfrom sklearn.metrics.pairwise import cosine_similarity\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport mlx.core as mx\nfrom mlx_embeddings.utils import load\n\n# Load the model and tokenizer\nmodel, tokenizer = load(\"sentence-transformers/all-MiniLM-L6-v2\")\n\ndef get_embedding(texts, model, tokenizer):\n    inputs = tokenizer.batch_encode_plus(texts, return_tensors=\"mlx\", padding=True, truncation=True, max_length=512)\n    outputs = model(\n        inputs[\"input_ids\"],\n        attention_mask=inputs[\"attention_mask\"]\n    )\n    return outputs[0]\n\ndef compute_and_print_similarity(embeddings):\n    B, Seq_len, dim = embeddings.shape\n    embeddings_2d = embeddings.reshape(B, -1)\n    similarity_matrix = cosine_similarity(embeddings_2d)\n\n    print(\"Similarity matrix between sequences:\")\n    print(similarity_matrix)\n    print(\"\\n\")\n\n    for i in range(B):\n        for j in range(i+1, B):\n            print(f\"Similarity between sequence {i+1} and sequence {j+1}: {similarity_matrix[i][j]:.4f}\")\n\n    return similarity_matrix\n\n# Sample texts\ntexts = [\n    \"I like grapes\",\n    \"I like fruits\",\n    \"The slow green turtle crawls under the busy ant.\"\n]\n\nembeddings = get_embedding(texts, model, tokenizer)\nsimilarity_matrix = compute_and_print_similarity(embeddings)\n\n# Visualize results\nlabels = [f\"Text {i+1}\" for i in range(len(texts))]\nplot_similarity_matrix(similarity_matrix, labels)\n```\n\n## Supported Models Archictectures\nMLX-Embeddings supports a variety of model architectures for text embedding tasks. Here's a breakdown of the currently supported architectures:\n- XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)\n- BERT (Bidirectional Encoder Representations from Transformers)\n\nWe're continuously working to expand our support for additional model architectures. Check our GitHub repository or documentation for the most up-to-date list of supported models and their specific versions.\n\n## Contributing\n\nContributions to MLX-Embeddings are welcome! Please refer to our contribution guidelines for more information.\n\n## License\n\nThis project is licensed under the GNU General Public License v3.\n\n## Contact\n\nFor any questions or issues, please open an issue on the [GitHub repository](https://github.com/Blaizzy/mlx-embeddings).\n",
    "bugtrack_url": null,
    "license": "GNU General Public License v3",
    "summary": "MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.",
    "version": "0.0.1",
    "project_urls": {
        "Homepage": "https://github.com/Blaizzy/mlx-embeddings"
    },
    "split_keywords": [
        "mlx-embeddings"
    ],
    "urls": [
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "95a7a37e4e2b4799f1429647bb6b066f333138c3e9e0140787ae68b22719e029",
                "md5": "a74c10ddfcc6885f12ee096db6c5725b",
                "sha256": "ef8a7ac73e1a68abc3aa4c873ccfa4d38c267c5703f46bb9dee9d95e401b717d"
            },
            "downloads": -1,
            "filename": "mlx_embeddings-0.0.1-py2.py3-none-any.whl",
            "has_sig": false,
            "md5_digest": "a74c10ddfcc6885f12ee096db6c5725b",
            "packagetype": "bdist_wheel",
            "python_version": "py2.py3",
            "requires_python": ">=3.8",
            "size": 18774,
            "upload_time": "2024-08-17T18:33:49",
            "upload_time_iso_8601": "2024-08-17T18:33:49.962308Z",
            "url": "https://files.pythonhosted.org/packages/95/a7/a37e4e2b4799f1429647bb6b066f333138c3e9e0140787ae68b22719e029/mlx_embeddings-0.0.1-py2.py3-none-any.whl",
            "yanked": false,
            "yanked_reason": null
        },
        {
            "comment_text": "",
            "digests": {
                "blake2b_256": "45764eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b",
                "md5": "22badb4429de9d761a97f382b153d661",
                "sha256": "e49ae6c8de476c0fdcd8bfc43e52fd3120f3ede76dba033866e6fd29422c7b72"
            },
            "downloads": -1,
            "filename": "mlx_embeddings-0.0.1.tar.gz",
            "has_sig": false,
            "md5_digest": "22badb4429de9d761a97f382b153d661",
            "packagetype": "sdist",
            "python_version": "source",
            "requires_python": ">=3.8",
            "size": 21673,
            "upload_time": "2024-08-17T18:33:52",
            "upload_time_iso_8601": "2024-08-17T18:33:52.235056Z",
            "url": "https://files.pythonhosted.org/packages/45/76/4eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b/mlx_embeddings-0.0.1.tar.gz",
            "yanked": false,
            "yanked_reason": null
        }
    ],
    "upload_time": "2024-08-17 18:33:52",
    "github": true,
    "gitlab": false,
    "bitbucket": false,
    "codeberg": false,
    "github_user": "Blaizzy",
    "github_project": "mlx-embeddings",
    "travis_ci": false,
    "coveralls": false,
    "github_actions": true,
    "requirements": [
        {
            "name": "mlx",
            "specs": [
                [
                    ">=",
                    "0.16.3"
                ]
            ]
        },
        {
            "name": "transformers",
            "specs": [
                [
                    ">=",
                    "4.44.0"
                ]
            ]
        }
    ],
    "lcname": "mlx-embeddings"
}
        
Elapsed time: 1.15180s