# MLX-Embeddings
[](https://pypi.python.org/pypi/mlx-embeddings)
**MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.**
- Free software: GNU General Public License v3
## Features
- Generate embeddings for text using MLX models
- Support for single-item and batch processing
- Utilities for comparing text similarities
## Installation
You can install mlx-embeddings using pip:
```bash
pip install mlx-embeddings
```
## Usage
### Single Item Embedding
To generate an embedding for a single piece of text:
```python
import mlx.core as mx
from mlx_embeddings.utils import load
# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")
# Prepare the text
text = "I like reading"
# Tokenize and generate embedding
input_ids = tokenizer.encode(text, return_tensors="mlx")
outputs = model(input_ids)
embeddings = outputs[0][:, 0, :]
```
### Comparing Multiple Texts
To compare multiple texts using their embeddings:
```python
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import mlx.core as mx
from mlx_embeddings.utils import load
# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")
def get_embedding(text, model, tokenizer):
input_ids = tokenizer.encode(text, return_tensors="mlx", padding=True, truncation=True, max_length=512)
outputs = model(input_ids)
embeddings = outputs[0][:, 0, :][0]
return embeddings
# Sample texts
texts = [
"I like grapes",
"I like fruits",
"The slow green turtle crawls under the busy ant."
]
# Generate embeddings
embeddings = [get_embedding(text, model, tokenizer) for text in texts]
# Compute similarity
similarity_matrix = cosine_similarity(embeddings)
# Visualize results
def plot_similarity_matrix(similarity_matrix, labels):
plt.figure(figsize=(5, 4))
sns.heatmap(similarity_matrix, annot=True, cmap='coolwarm', xticklabels=labels, yticklabels=labels)
plt.title('Similarity Matrix Heatmap')
plt.tight_layout()
plt.show()
labels = [f"Text {i+1}" for i in range(len(texts))]
plot_similarity_matrix(similarity_matrix, labels)
```
### Batch Processing
For processing multiple texts at once:
```python
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import seaborn as sns
import mlx.core as mx
from mlx_embeddings.utils import load
# Load the model and tokenizer
model, tokenizer = load("sentence-transformers/all-MiniLM-L6-v2")
def get_embedding(texts, model, tokenizer):
inputs = tokenizer.batch_encode_plus(texts, return_tensors="mlx", padding=True, truncation=True, max_length=512)
outputs = model(
inputs["input_ids"],
attention_mask=inputs["attention_mask"]
)
return outputs[0]
def compute_and_print_similarity(embeddings):
B, Seq_len, dim = embeddings.shape
embeddings_2d = embeddings.reshape(B, -1)
similarity_matrix = cosine_similarity(embeddings_2d)
print("Similarity matrix between sequences:")
print(similarity_matrix)
print("\n")
for i in range(B):
for j in range(i+1, B):
print(f"Similarity between sequence {i+1} and sequence {j+1}: {similarity_matrix[i][j]:.4f}")
return similarity_matrix
# Sample texts
texts = [
"I like grapes",
"I like fruits",
"The slow green turtle crawls under the busy ant."
]
embeddings = get_embedding(texts, model, tokenizer)
similarity_matrix = compute_and_print_similarity(embeddings)
# Visualize results
labels = [f"Text {i+1}" for i in range(len(texts))]
plot_similarity_matrix(similarity_matrix, labels)
```
## Supported Models Archictectures
MLX-Embeddings supports a variety of model architectures for text embedding tasks. Here's a breakdown of the currently supported architectures:
- XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)
- BERT (Bidirectional Encoder Representations from Transformers)
We're continuously working to expand our support for additional model architectures. Check our GitHub repository or documentation for the most up-to-date list of supported models and their specific versions.
## Contributing
Contributions to MLX-Embeddings are welcome! Please refer to our contribution guidelines for more information.
## License
This project is licensed under the GNU General Public License v3.
## Contact
For any questions or issues, please open an issue on the [GitHub repository](https://github.com/Blaizzy/mlx-embeddings).
Raw data
{
"_id": null,
"home_page": null,
"name": "mlx-embeddings",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.8",
"maintainer_email": null,
"keywords": "mlx-embeddings",
"author": null,
"author_email": "Prince Canuma <prince.gdt@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/45/76/4eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b/mlx_embeddings-0.0.1.tar.gz",
"platform": null,
"description": "# MLX-Embeddings\n\n[](https://pypi.python.org/pypi/mlx-embeddings)\n\n**MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.**\n\n- Free software: GNU General Public License v3\n\n## Features\n\n- Generate embeddings for text using MLX models\n- Support for single-item and batch processing\n- Utilities for comparing text similarities\n\n## Installation\n\nYou can install mlx-embeddings using pip:\n\n```bash\npip install mlx-embeddings\n```\n\n## Usage\n\n### Single Item Embedding\n\nTo generate an embedding for a single piece of text:\n\n```python\nimport mlx.core as mx\nfrom mlx_embeddings.utils import load\n\n# Load the model and tokenizer\nmodel, tokenizer = load(\"sentence-transformers/all-MiniLM-L6-v2\")\n\n# Prepare the text\ntext = \"I like reading\"\n\n# Tokenize and generate embedding\ninput_ids = tokenizer.encode(text, return_tensors=\"mlx\")\noutputs = model(input_ids)\nembeddings = outputs[0][:, 0, :]\n```\n\n### Comparing Multiple Texts\n\nTo compare multiple texts using their embeddings:\n\n```python\nfrom sklearn.metrics.pairwise import cosine_similarity\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport mlx.core as mx\nfrom mlx_embeddings.utils import load\n\n# Load the model and tokenizer\nmodel, tokenizer = load(\"sentence-transformers/all-MiniLM-L6-v2\")\n\ndef get_embedding(text, model, tokenizer):\n input_ids = tokenizer.encode(text, return_tensors=\"mlx\", padding=True, truncation=True, max_length=512)\n outputs = model(input_ids)\n embeddings = outputs[0][:, 0, :][0]\n return embeddings\n\n# Sample texts\ntexts = [\n \"I like grapes\",\n \"I like fruits\",\n \"The slow green turtle crawls under the busy ant.\"\n]\n\n# Generate embeddings\nembeddings = [get_embedding(text, model, tokenizer) for text in texts]\n\n# Compute similarity\nsimilarity_matrix = cosine_similarity(embeddings)\n\n# Visualize results\ndef plot_similarity_matrix(similarity_matrix, labels):\n plt.figure(figsize=(5, 4))\n sns.heatmap(similarity_matrix, annot=True, cmap='coolwarm', xticklabels=labels, yticklabels=labels)\n plt.title('Similarity Matrix Heatmap')\n plt.tight_layout()\n plt.show()\n\nlabels = [f\"Text {i+1}\" for i in range(len(texts))]\nplot_similarity_matrix(similarity_matrix, labels)\n```\n\n### Batch Processing\n\nFor processing multiple texts at once:\n\n```python\nfrom sklearn.metrics.pairwise import cosine_similarity\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nimport mlx.core as mx\nfrom mlx_embeddings.utils import load\n\n# Load the model and tokenizer\nmodel, tokenizer = load(\"sentence-transformers/all-MiniLM-L6-v2\")\n\ndef get_embedding(texts, model, tokenizer):\n inputs = tokenizer.batch_encode_plus(texts, return_tensors=\"mlx\", padding=True, truncation=True, max_length=512)\n outputs = model(\n inputs[\"input_ids\"],\n attention_mask=inputs[\"attention_mask\"]\n )\n return outputs[0]\n\ndef compute_and_print_similarity(embeddings):\n B, Seq_len, dim = embeddings.shape\n embeddings_2d = embeddings.reshape(B, -1)\n similarity_matrix = cosine_similarity(embeddings_2d)\n\n print(\"Similarity matrix between sequences:\")\n print(similarity_matrix)\n print(\"\\n\")\n\n for i in range(B):\n for j in range(i+1, B):\n print(f\"Similarity between sequence {i+1} and sequence {j+1}: {similarity_matrix[i][j]:.4f}\")\n\n return similarity_matrix\n\n# Sample texts\ntexts = [\n \"I like grapes\",\n \"I like fruits\",\n \"The slow green turtle crawls under the busy ant.\"\n]\n\nembeddings = get_embedding(texts, model, tokenizer)\nsimilarity_matrix = compute_and_print_similarity(embeddings)\n\n# Visualize results\nlabels = [f\"Text {i+1}\" for i in range(len(texts))]\nplot_similarity_matrix(similarity_matrix, labels)\n```\n\n## Supported Models Archictectures\nMLX-Embeddings supports a variety of model architectures for text embedding tasks. Here's a breakdown of the currently supported architectures:\n- XLM-RoBERTa (Cross-lingual Language Model - Robustly Optimized BERT Approach)\n- BERT (Bidirectional Encoder Representations from Transformers)\n\nWe're continuously working to expand our support for additional model architectures. Check our GitHub repository or documentation for the most up-to-date list of supported models and their specific versions.\n\n## Contributing\n\nContributions to MLX-Embeddings are welcome! Please refer to our contribution guidelines for more information.\n\n## License\n\nThis project is licensed under the GNU General Public License v3.\n\n## Contact\n\nFor any questions or issues, please open an issue on the [GitHub repository](https://github.com/Blaizzy/mlx-embeddings).\n",
"bugtrack_url": null,
"license": "GNU General Public License v3",
"summary": "MLX-Embeddings is a package for running Vision and Language Embedding models locally on your Mac using MLX.",
"version": "0.0.1",
"project_urls": {
"Homepage": "https://github.com/Blaizzy/mlx-embeddings"
},
"split_keywords": [
"mlx-embeddings"
],
"urls": [
{
"comment_text": "",
"digests": {
"blake2b_256": "95a7a37e4e2b4799f1429647bb6b066f333138c3e9e0140787ae68b22719e029",
"md5": "a74c10ddfcc6885f12ee096db6c5725b",
"sha256": "ef8a7ac73e1a68abc3aa4c873ccfa4d38c267c5703f46bb9dee9d95e401b717d"
},
"downloads": -1,
"filename": "mlx_embeddings-0.0.1-py2.py3-none-any.whl",
"has_sig": false,
"md5_digest": "a74c10ddfcc6885f12ee096db6c5725b",
"packagetype": "bdist_wheel",
"python_version": "py2.py3",
"requires_python": ">=3.8",
"size": 18774,
"upload_time": "2024-08-17T18:33:49",
"upload_time_iso_8601": "2024-08-17T18:33:49.962308Z",
"url": "https://files.pythonhosted.org/packages/95/a7/a37e4e2b4799f1429647bb6b066f333138c3e9e0140787ae68b22719e029/mlx_embeddings-0.0.1-py2.py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": "",
"digests": {
"blake2b_256": "45764eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b",
"md5": "22badb4429de9d761a97f382b153d661",
"sha256": "e49ae6c8de476c0fdcd8bfc43e52fd3120f3ede76dba033866e6fd29422c7b72"
},
"downloads": -1,
"filename": "mlx_embeddings-0.0.1.tar.gz",
"has_sig": false,
"md5_digest": "22badb4429de9d761a97f382b153d661",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.8",
"size": 21673,
"upload_time": "2024-08-17T18:33:52",
"upload_time_iso_8601": "2024-08-17T18:33:52.235056Z",
"url": "https://files.pythonhosted.org/packages/45/76/4eeb8f5058fbb39bf5894904847421961681129d84602e17326b112e663b/mlx_embeddings-0.0.1.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2024-08-17 18:33:52",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "Blaizzy",
"github_project": "mlx-embeddings",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "mlx",
"specs": [
[
">=",
"0.16.3"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.44.0"
]
]
}
],
"lcname": "mlx-embeddings"
}