# ๐ฅ Single Model Embedding & Reranking API
<div align="center">
<strong>Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first, OpenAI & TEI compatible)</strong>
<br/><br/>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/v/embed-rerank?logo=pypi&logoColor=white" /></a>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/dm/embed-rerank?logo=pypi&logoColor=white" /></a>
<a href="https://pypi.org/project/embed-rerank/"><img src="https://img.shields.io/pypi/pyversions/embed-rerank?logo=python&logoColor=white" /></a>
<a href="https://github.com/joonsoo-me/embed-rerank/blob/main/LICENSE"><img src="https://img.shields.io/github/license/joonsoo-me/embed-rerank?logo=opensource&logoColor=white" /></a>
<a href="https://developer.apple.com/silicon/"><img src="https://img.shields.io/badge/Apple_Silicon-Ready-blue?logo=apple&logoColor=white" /></a>
<a href="https://ml-explore.github.io/mlx/"><img src="https://img.shields.io/badge/MLX-Optimized-green?logo=apple&logoColor=white" /></a>
<a href="https://fastapi.tiangolo.com/"><img src="https://img.shields.io/badge/FastAPI-009688?logo=fastapi&logoColor=white" /></a>
</div>
---
## โก Why This Matters
Transform your text processing with **10x faster** embeddings and reranking on Apple Silicon. Drop-in replacement for OpenAI API and Hugging Face TEI with **zero code changes** required.
### ๐ Performance Comparison
| Operation | This API (MLX) | OpenAI API | Hugging Face TEI |
|-----------|----------------|------------|------------------|
| **Embeddings** | `0.78ms` | `200ms+` | `15ms` |
| **Reranking** | `1.04ms` | `N/A` | `25ms` |
| **Model Loading** | `0.36s` | `N/A` | `3.2s` |
| **Cost** | `$0` | `$0.02/1K` | `$0` |
*Tested on Apple M4 Max*
---
## ๐ Quick Start
### Option 1: Install from PyPI (Recommended)
```bash
# Install the package
pip install embed-rerank
# Start the server (default port 9000)
embed-rerank
# Or with custom port and options
embed-rerank --port 8080 --host 127.0.0.1
# See all options
embed-rerank --help
```
### Option 2: From Source (Development)
```bash
# 1. Clone and setup
git clone https://github.com/joonsoo-me/embed-rerank.git
cd embed-rerank
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# 2. Start server (macOS/Linux)
./tools/server-run.sh
# 3. Test it works
curl http://localhost:9000/health/
```
๐ **Done!** Visit http://localhost:9000/docs for interactive API documentation.
---
## ๐ Server Management (macOS/Linux)
```bash
# Start server (background)
./tools/server-run.sh
# Start server (foreground/development)
./tools/server-run-foreground.sh
# Stop server
./tools/server-stop.sh
```
> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.
---
## โ๏ธ CLI Configuration
### PyPI Package CLI Options
**Server Options:**
- `--host`: Server host (default: 0.0.0.0)
- `--port`: Server port (default: 9000)
- `--reload`: Enable auto-reload for development
- `--log-level`: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
**Testing Options:**
- `--test quick`: Run quick validation tests
- `--test performance`: Run performance benchmark tests
- `--test quality`: Run quality validation tests
- `--test full`: Run comprehensive test suite
- `--test-url`: Custom server URL for testing
- `--test-output`: Test output directory
**Examples:**
```bash
# Custom server configuration
embed-rerank --port 8080 --host 127.0.0.1 --reload
# Built-in performance testing
embed-rerank --port 8080 &
embed-rerank --test performance --test-url http://localhost:8080
pkill -f embed-rerank
# Environment variables
export PORT=8080 HOST=127.0.0.1
embed-rerank
```
### Source Code Configuration
Create `.env` file for development:
```env
# Server
PORT=9000
HOST=0.0.0.0
# Backend
BACKEND=auto # auto | mlx | torch
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ
# Model Cache (first run downloads ~2.3GB model)
MODEL_PATH= # Custom model directory
TRANSFORMERS_CACHE= # HF cache override
# Default: ~/.cache/huggingface/hub/
# Performance
BATCH_SIZE=32
MAX_TEXTS_PER_REQUEST=100
```
---
### ๐ Model Cache Management
The service automatically manages model downloads and caching:
| Environment Variable | Purpose | Default |
|---------------------|---------|---------|
| `MODEL_PATH` | Custom model directory | *(uses HF cache)* |
| `TRANSFORMERS_CACHE` | Override HF cache location | `~/.cache/huggingface/transformers` |
| `HF_HOME` | HF home directory | `~/.cache/huggingface` |
| *(auto)* | Default HF cache | `~/.cache/huggingface/hub/` |
#### Cache Location Check
``` bash
# Find where your model is cached
python3 -c "
import os
print('MODEL_PATH:', os.getenv('MODEL_PATH', '<not set>'))
print('TRANSFORMERS_CACHE:', os.getenv('TRANSFORMERS_CACHE', '<not set>'))
print('HF_HOME:', os.getenv('HF_HOME', '<not set>'))
print('Default cache:', os.path.expanduser('~/.cache/huggingface/hub'))
"
# List cached Qwen3 models
ls ~/.cache/huggingface/hub | grep -i qwen3 || echo "No Qwen3 models found in cache"
```
---
## ๐ Three APIs, One Service
| API | Endpoint | Use Case |
|-----|----------|----------|
| **Native** | `/api/v1/embed`, `/api/v1/rerank` | New projects |
| **OpenAI** | `/v1/embeddings` | Existing OpenAI code |
| **TEI** | `/embed`, `/rerank` | Hugging Face TEI replacement |
### OpenAI Compatible (Drop-in)
```python
import openai
client = openai.OpenAI(
api_key="dummy-key",
base_url="http://localhost:9000/v1"
)
response = client.embeddings.create(
input=["Hello world", "Apple Silicon is fast!"],
model="text-embedding-ada-002"
)
# ๐ 10x faster than OpenAI, same code!
```
### TEI Compatible
```bash
curl -X POST "http://localhost:9000/embed"
-H "Content-Type: application/json"
-d '{"inputs": ["Hello world"], "truncate": true}'
```
### Native API
```bash
# Embeddings
curl -X POST "http://localhost:9000/api/v1/embed/"
-H "Content-Type: application/json"
-d '{"texts": ["Apple Silicon", "MLX acceleration"]}'
# Reranking
curl -X POST "http://localhost:9000/api/v1/rerank/"
-H "Content-Type: application/json"
-d '{"query": "machine learning", "passages": ["AI is cool", "Dogs are pets", "MLX is fast"]}'
```
---
## ๐งช Performance Testing & Validation
### ๐ Built-in CLI Testing (PyPI Package)
The PyPI package includes powerful built-in testing capabilities:
```bash
# Quick validation (basic functionality check)
embed-rerank --test quick
# Performance benchmark (latency, throughput, concurrency)
embed-rerank --test performance --test-url http://localhost:9000
# Quality validation (semantic similarity, multilingual)
embed-rerank --test quality --test-url http://localhost:9000
# Full comprehensive test suite
embed-rerank --test full --test-url http://localhost:9000
```
**Test Results Include:**
- ๐ **Latency Metrics**: Mean, P95, P99 response times
- ๐ **Throughput Analysis**: Texts/sec processing rates
- ๐ **Concurrency Testing**: Multi-threaded request handling
- ๐ง **Semantic Validation**: Quality of embeddings and reranking
- ๐ **Multilingual Support**: Cross-language performance
- ๐ **JSON Reports**: Detailed metrics for automation
**Example Output:**
```bash
๐งช Running Embed-Rerank Test Suite
๐ Target URL: http://localhost:9000
๐ฏ Test Mode: performance
โก Performance Results:
โข Latency: 0.8ms avg, 1.2ms max
โข Throughput: 1,250 texts/sec peak
โข Concurrency: 5/5 successful (100%)
๐ Results saved to: ./test-results/performance_test_results.json
```
### ๐ง Advanced Testing (Source Code)
```bash
### ๐ง Advanced Testing (Source Code)
For development and comprehensive testing with the source code:
```bash
# Comprehensive test suite (shell script)
./tools/server-tests.sh
# Run with specific test modes
./tools/server-tests.sh --quick # Quick validation only
./tools/server-tests.sh --performance # Performance tests only
./tools/server-tests.sh --full # Full test suite
# Custom server URL
./tools/server-tests.sh --url http://localhost:8080
# Manual health check
curl http://localhost:9000/health/
# Unit tests with pytest
pytest tests/ -v
```
---
## ๐ Development & Deployment
### Local Development (Source Code)
```bash
# Start server (background)
./tools/server-run.sh
# Start server (foreground/development)
./tools/server-run-foreground.sh
# Stop server
./tools/server-stop.sh
```
### Production Deployment (PyPI Package)
```bash
# Install and run
pip install embed-rerank
embed-rerank --port 9000 --host 0.0.0.0
# With custom configuration
embed-rerank --port 8080 --reload --log-level DEBUG
# Background deployment
embed-rerank --port 9000 &
```
> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.
```
---
## ๐ What You Get
### ๐ฏ Core Features
- โ
**Zero Code Changes**: Drop-in replacement for OpenAI API and TEI
- โก **10x Performance**: Apple MLX acceleration on Apple Silicon
- ๐ฐ **Zero Costs**: No API fees, runs locally
- ๐ **Privacy**: Your data never leaves your machine
- ๐ฏ **Three APIs**: Native, OpenAI, and TEI compatibility
- ๐ **Production Ready**: Health checks, monitoring, structured logging
### ๐งช Built-in Testing & Benchmarking
- ๐ **CLI Performance Testing**: One-command benchmarking
- ๐ **Concurrency Testing**: Multi-threaded request validation
- ๐ง **Quality Validation**: Semantic similarity and multilingual testing
- ๐ **JSON Reports**: Automated performance monitoring
- ๐ **Real-time Metrics**: Latency, throughput, and success rates
### ๐ Deployment Options
- ๐ฆ **PyPI Package**: `pip install embed-rerank` for instant deployment
- ๐ง **Source Code**: Full development environment with advanced tooling
- ๐ **Multi-API Support**: OpenAI, TEI, and native endpoints
- โ๏ธ **Flexible Configuration**: Environment variables, CLI args, .env files
---
## ๏ฟฝ Quick Reference
### Installation & Startup
```bash
# PyPI Package (Production)
pip install embed-rerank && embed-rerank
# Source Code (Development)
git clone https://github.com/joonsoo-me/embed-rerank.git
cd embed-rerank && ./tools/server-run.sh
```
### Performance Testing
```bash
# One-command benchmark
embed-rerank --test performance --test-url http://localhost:9000
# Comprehensive testing
./tools/server-tests.sh --full
```
### API Endpoints
- **Native**: `POST /api/v1/embed/` and `/api/v1/rerank/`
- **OpenAI**: `POST /v1/embeddings` (drop-in replacement)
- **TEI**: `POST /embed` and `/rerank` (Hugging Face compatible)
- **Health**: `GET /health/` (monitoring and diagnostics)
---
## ๏ฟฝ๐ License
MIT License - build amazing things with this code!
Raw data
{
"_id": null,
"home_page": null,
"name": "embed-rerank",
"maintainer": null,
"docs_url": null,
"requires_python": ">=3.13",
"maintainer_email": null,
"keywords": "apple-silicon, embeddings, fastapi, mlx, reranking",
"author": null,
"author_email": "joonsoo-me <bear8203@gmail.com>",
"download_url": "https://files.pythonhosted.org/packages/10/98/5bd6dbc4aa37941f68efeeaf5bb4acc54cf61e8134f30572898d8eb26aa6/embed_rerank-1.1.3.tar.gz",
"platform": null,
"description": "# \ud83d\udd25 Single Model Embedding & Reranking API\n\n<div align=\"center\">\n<strong>Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first, OpenAI & TEI compatible)</strong>\n<br/><br/>\n<a href=\"https://pypi.org/project/embed-rerank/\"><img src=\"https://img.shields.io/pypi/v/embed-rerank?logo=pypi&logoColor=white\" /></a>\n<a href=\"https://pypi.org/project/embed-rerank/\"><img src=\"https://img.shields.io/pypi/dm/embed-rerank?logo=pypi&logoColor=white\" /></a>\n<a href=\"https://pypi.org/project/embed-rerank/\"><img src=\"https://img.shields.io/pypi/pyversions/embed-rerank?logo=python&logoColor=white\" /></a>\n<a href=\"https://github.com/joonsoo-me/embed-rerank/blob/main/LICENSE\"><img src=\"https://img.shields.io/github/license/joonsoo-me/embed-rerank?logo=opensource&logoColor=white\" /></a>\n<a href=\"https://developer.apple.com/silicon/\"><img src=\"https://img.shields.io/badge/Apple_Silicon-Ready-blue?logo=apple&logoColor=white\" /></a>\n<a href=\"https://ml-explore.github.io/mlx/\"><img src=\"https://img.shields.io/badge/MLX-Optimized-green?logo=apple&logoColor=white\" /></a>\n<a href=\"https://fastapi.tiangolo.com/\"><img src=\"https://img.shields.io/badge/FastAPI-009688?logo=fastapi&logoColor=white\" /></a>\n</div>\n\n---\n\n## \u26a1 Why This Matters\n\nTransform your text processing with **10x faster** embeddings and reranking on Apple Silicon. Drop-in replacement for OpenAI API and Hugging Face TEI with **zero code changes** required.\n\n### \ud83c\udfc6 Performance Comparison\n\n| Operation | This API (MLX) | OpenAI API | Hugging Face TEI |\n|-----------|----------------|------------|------------------|\n| **Embeddings** | `0.78ms` | `200ms+` | `15ms` |\n| **Reranking** | `1.04ms` | `N/A` | `25ms` |\n| **Model Loading** | `0.36s` | `N/A` | `3.2s` |\n| **Cost** | `$0` | `$0.02/1K` | `$0` |\n\n*Tested on Apple M4 Max*\n\n---\n\n## \ud83d\ude80 Quick Start\n\n### Option 1: Install from PyPI (Recommended)\n\n```bash\n# Install the package\npip install embed-rerank\n\n# Start the server (default port 9000)\nembed-rerank\n\n# Or with custom port and options\nembed-rerank --port 8080 --host 127.0.0.1\n\n# See all options\nembed-rerank --help\n```\n\n### Option 2: From Source (Development)\n\n```bash\n# 1. Clone and setup\ngit clone https://github.com/joonsoo-me/embed-rerank.git\ncd embed-rerank\npython -m venv .venv && source .venv/bin/activate\npip install -r requirements.txt\n\n# 2. Start server (macOS/Linux)\n./tools/server-run.sh\n\n# 3. Test it works\ncurl http://localhost:9000/health/\n```\n\n\ud83c\udf89 **Done!** Visit http://localhost:9000/docs for interactive API documentation.\n\n---\n\n## \ud83d\udee0 Server Management (macOS/Linux)\n\n```bash\n# Start server (background)\n./tools/server-run.sh\n\n# Start server (foreground/development)\n./tools/server-run-foreground.sh\n\n# Stop server\n./tools/server-stop.sh\n```\n\n> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.\n\n---\n\n## \u2699\ufe0f CLI Configuration\n\n### PyPI Package CLI Options\n\n**Server Options:**\n- `--host`: Server host (default: 0.0.0.0)\n- `--port`: Server port (default: 9000)\n- `--reload`: Enable auto-reload for development\n- `--log-level`: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)\n\n**Testing Options:**\n- `--test quick`: Run quick validation tests\n- `--test performance`: Run performance benchmark tests \n- `--test quality`: Run quality validation tests\n- `--test full`: Run comprehensive test suite\n- `--test-url`: Custom server URL for testing\n- `--test-output`: Test output directory\n\n**Examples:**\n```bash\n# Custom server configuration\nembed-rerank --port 8080 --host 127.0.0.1 --reload\n\n# Built-in performance testing\nembed-rerank --port 8080 &\nembed-rerank --test performance --test-url http://localhost:8080\npkill -f embed-rerank\n\n# Environment variables\nexport PORT=8080 HOST=127.0.0.1\nembed-rerank\n```\n\n### Source Code Configuration\n\nCreate `.env` file for development:\n\n```env\n# Server\nPORT=9000\nHOST=0.0.0.0\n\n# Backend\nBACKEND=auto # auto | mlx | torch\nMODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ\n\n# Model Cache (first run downloads ~2.3GB model)\nMODEL_PATH= # Custom model directory\nTRANSFORMERS_CACHE= # HF cache override\n# Default: ~/.cache/huggingface/hub/\n\n# Performance\nBATCH_SIZE=32\nMAX_TEXTS_PER_REQUEST=100\n```\n\n---\n\n### \ud83d\udcc2 Model Cache Management\n\nThe service automatically manages model downloads and caching:\n\n| Environment Variable | Purpose | Default |\n|---------------------|---------|---------|\n| `MODEL_PATH` | Custom model directory | *(uses HF cache)* |\n| `TRANSFORMERS_CACHE` | Override HF cache location | `~/.cache/huggingface/transformers` |\n| `HF_HOME` | HF home directory | `~/.cache/huggingface` |\n| *(auto)* | Default HF cache | `~/.cache/huggingface/hub/` |\n\n#### Cache Location Check\n``` bash\n# Find where your model is cached\npython3 -c \"\nimport os\nprint('MODEL_PATH:', os.getenv('MODEL_PATH', '<not set>'))\nprint('TRANSFORMERS_CACHE:', os.getenv('TRANSFORMERS_CACHE', '<not set>'))\nprint('HF_HOME:', os.getenv('HF_HOME', '<not set>'))\nprint('Default cache:', os.path.expanduser('~/.cache/huggingface/hub'))\n\"\n\n# List cached Qwen3 models\nls ~/.cache/huggingface/hub | grep -i qwen3 || echo \"No Qwen3 models found in cache\"\n```\n\n---\n\n## \ud83c\udf10 Three APIs, One Service\n\n| API | Endpoint | Use Case |\n|-----|----------|----------|\n| **Native** | `/api/v1/embed`, `/api/v1/rerank` | New projects |\n| **OpenAI** | `/v1/embeddings` | Existing OpenAI code |\n| **TEI** | `/embed`, `/rerank` | Hugging Face TEI replacement |\n\n### OpenAI Compatible (Drop-in)\n\n```python\nimport openai\n\nclient = openai.OpenAI(\n api_key=\"dummy-key\",\n base_url=\"http://localhost:9000/v1\"\n)\n\nresponse = client.embeddings.create(\n input=[\"Hello world\", \"Apple Silicon is fast!\"],\n model=\"text-embedding-ada-002\"\n)\n# \ud83d\ude80 10x faster than OpenAI, same code!\n```\n\n### TEI Compatible\n\n```bash\ncurl -X POST \"http://localhost:9000/embed\" \n -H \"Content-Type: application/json\" \n -d '{\"inputs\": [\"Hello world\"], \"truncate\": true}'\n```\n\n### Native API\n\n```bash\n# Embeddings\ncurl -X POST \"http://localhost:9000/api/v1/embed/\" \n -H \"Content-Type: application/json\" \n -d '{\"texts\": [\"Apple Silicon\", \"MLX acceleration\"]}'\n\n# Reranking \ncurl -X POST \"http://localhost:9000/api/v1/rerank/\" \n -H \"Content-Type: application/json\" \n -d '{\"query\": \"machine learning\", \"passages\": [\"AI is cool\", \"Dogs are pets\", \"MLX is fast\"]}'\n```\n\n---\n\n## \ud83e\uddea Performance Testing & Validation\n\n### \ud83d\ude80 Built-in CLI Testing (PyPI Package)\n\nThe PyPI package includes powerful built-in testing capabilities:\n\n```bash\n# Quick validation (basic functionality check)\nembed-rerank --test quick\n\n# Performance benchmark (latency, throughput, concurrency)\nembed-rerank --test performance --test-url http://localhost:9000\n\n# Quality validation (semantic similarity, multilingual) \nembed-rerank --test quality --test-url http://localhost:9000\n\n# Full comprehensive test suite\nembed-rerank --test full --test-url http://localhost:9000\n```\n\n**Test Results Include:**\n- \ud83d\udcca **Latency Metrics**: Mean, P95, P99 response times\n- \ud83d\ude80 **Throughput Analysis**: Texts/sec processing rates\n- \ud83d\udd04 **Concurrency Testing**: Multi-threaded request handling\n- \ud83e\udde0 **Semantic Validation**: Quality of embeddings and reranking\n- \ud83c\udf0d **Multilingual Support**: Cross-language performance\n- \ud83d\udcc8 **JSON Reports**: Detailed metrics for automation\n\n**Example Output:**\n```bash\n\ud83e\uddea Running Embed-Rerank Test Suite\n\ud83d\udccd Target URL: http://localhost:9000\n\ud83c\udfaf Test Mode: performance\n\n\u26a1 Performance Results:\n\u2022 Latency: 0.8ms avg, 1.2ms max\n\u2022 Throughput: 1,250 texts/sec peak \n\u2022 Concurrency: 5/5 successful (100%)\n\ud83d\udcc1 Results saved to: ./test-results/performance_test_results.json\n```\n\n### \ud83d\udd27 Advanced Testing (Source Code)\n\n```bash\n### \ud83d\udd27 Advanced Testing (Source Code)\n\nFor development and comprehensive testing with the source code:\n\n```bash\n# Comprehensive test suite (shell script)\n./tools/server-tests.sh\n\n# Run with specific test modes\n./tools/server-tests.sh --quick # Quick validation only\n./tools/server-tests.sh --performance # Performance tests only\n./tools/server-tests.sh --full # Full test suite\n\n# Custom server URL\n./tools/server-tests.sh --url http://localhost:8080\n\n# Manual health check\ncurl http://localhost:9000/health/\n\n# Unit tests with pytest\npytest tests/ -v\n```\n\n---\n\n## \ud83d\udee0 Development & Deployment\n\n### Local Development (Source Code)\n\n```bash\n# Start server (background)\n./tools/server-run.sh\n\n# Start server (foreground/development)\n./tools/server-run-foreground.sh\n\n# Stop server\n./tools/server-stop.sh\n```\n\n### Production Deployment (PyPI Package)\n\n```bash\n# Install and run\npip install embed-rerank\nembed-rerank --port 9000 --host 0.0.0.0\n\n# With custom configuration\nembed-rerank --port 8080 --reload --log-level DEBUG\n\n# Background deployment\nembed-rerank --port 9000 &\n```\n\n> **Windows Support**: Coming soon! Currently optimized for macOS/Linux.\n```\n\n---\n\n## \ud83d\ude80 What You Get\n\n### \ud83c\udfaf Core Features\n- \u2705 **Zero Code Changes**: Drop-in replacement for OpenAI API and TEI\n- \u26a1 **10x Performance**: Apple MLX acceleration on Apple Silicon \n- \ud83d\udcb0 **Zero Costs**: No API fees, runs locally\n- \ud83d\udd12 **Privacy**: Your data never leaves your machine\n- \ud83c\udfaf **Three APIs**: Native, OpenAI, and TEI compatibility\n- \ud83d\udcca **Production Ready**: Health checks, monitoring, structured logging\n\n### \ud83e\uddea Built-in Testing & Benchmarking\n- \ud83d\udcc8 **CLI Performance Testing**: One-command benchmarking\n- \ud83d\udd04 **Concurrency Testing**: Multi-threaded request validation\n- \ud83e\udde0 **Quality Validation**: Semantic similarity and multilingual testing\n- \ud83d\udcca **JSON Reports**: Automated performance monitoring\n- \ud83d\ude80 **Real-time Metrics**: Latency, throughput, and success rates\n\n### \ud83d\udee0 Deployment Options\n- \ud83d\udce6 **PyPI Package**: `pip install embed-rerank` for instant deployment\n- \ud83d\udd27 **Source Code**: Full development environment with advanced tooling\n- \ud83c\udf10 **Multi-API Support**: OpenAI, TEI, and native endpoints\n- \u2699\ufe0f **Flexible Configuration**: Environment variables, CLI args, .env files\n\n---\n\n## \ufffd Quick Reference\n\n### Installation & Startup\n```bash\n# PyPI Package (Production)\npip install embed-rerank && embed-rerank\n\n# Source Code (Development) \ngit clone https://github.com/joonsoo-me/embed-rerank.git\ncd embed-rerank && ./tools/server-run.sh\n```\n\n### Performance Testing\n```bash\n# One-command benchmark\nembed-rerank --test performance --test-url http://localhost:9000\n\n# Comprehensive testing\n./tools/server-tests.sh --full\n```\n\n### API Endpoints\n- **Native**: `POST /api/v1/embed/` and `/api/v1/rerank/`\n- **OpenAI**: `POST /v1/embeddings` (drop-in replacement)\n- **TEI**: `POST /embed` and `/rerank` (Hugging Face compatible)\n- **Health**: `GET /health/` (monitoring and diagnostics)\n\n---\n\n## \ufffd\ud83d\udcc4 License\n\nMIT License - build amazing things with this code!\n",
"bugtrack_url": null,
"license": null,
"summary": "Single Model Embedding & Reranker API with Apple Silicon acceleration",
"version": "1.1.3",
"project_urls": {
"Documentation": "https://github.com/joonsoo-me/embed-rerank#readme",
"Issues": "https://github.com/joonsoo-me/embed-rerank/issues",
"Source": "https://github.com/joonsoo-me/embed-rerank"
},
"split_keywords": [
"apple-silicon",
" embeddings",
" fastapi",
" mlx",
" reranking"
],
"urls": [
{
"comment_text": null,
"digests": {
"blake2b_256": "856ccb74a18477d0d9980e4bd5b24b92cf4df5e0cb32d3e4178bcd5342bf1574",
"md5": "3ad424058c3f58e3f8dcc45ebaa25ea8",
"sha256": "2a384ecaf7aa92e3dfe1e659d538c3231a0a03a34db0af9c7c3a7313f99e7514"
},
"downloads": -1,
"filename": "embed_rerank-1.1.3-py3-none-any.whl",
"has_sig": false,
"md5_digest": "3ad424058c3f58e3f8dcc45ebaa25ea8",
"packagetype": "bdist_wheel",
"python_version": "py3",
"requires_python": ">=3.13",
"size": 62452,
"upload_time": "2025-09-03T07:53:10",
"upload_time_iso_8601": "2025-09-03T07:53:10.703220Z",
"url": "https://files.pythonhosted.org/packages/85/6c/cb74a18477d0d9980e4bd5b24b92cf4df5e0cb32d3e4178bcd5342bf1574/embed_rerank-1.1.3-py3-none-any.whl",
"yanked": false,
"yanked_reason": null
},
{
"comment_text": null,
"digests": {
"blake2b_256": "10985bd6dbc4aa37941f68efeeaf5bb4acc54cf61e8134f30572898d8eb26aa6",
"md5": "5c0dd985671b0fba257d8f714ffb5018",
"sha256": "8401882948dda4c6cd84cd22a02373f49a01cf9277ab0d73b02823c7fae0f1f7"
},
"downloads": -1,
"filename": "embed_rerank-1.1.3.tar.gz",
"has_sig": false,
"md5_digest": "5c0dd985671b0fba257d8f714ffb5018",
"packagetype": "sdist",
"python_version": "source",
"requires_python": ">=3.13",
"size": 101254,
"upload_time": "2025-09-03T07:53:12",
"upload_time_iso_8601": "2025-09-03T07:53:12.226727Z",
"url": "https://files.pythonhosted.org/packages/10/98/5bd6dbc4aa37941f68efeeaf5bb4acc54cf61e8134f30572898d8eb26aa6/embed_rerank-1.1.3.tar.gz",
"yanked": false,
"yanked_reason": null
}
],
"upload_time": "2025-09-03 07:53:12",
"github": true,
"gitlab": false,
"bitbucket": false,
"codeberg": false,
"github_user": "joonsoo-me",
"github_project": "embed-rerank#readme",
"travis_ci": false,
"coveralls": false,
"github_actions": true,
"requirements": [
{
"name": "fastapi",
"specs": [
[
">=",
"0.104.0"
]
]
},
{
"name": "uvicorn",
"specs": [
[
">=",
"0.24.0"
]
]
},
{
"name": "pydantic",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "pydantic-settings",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "torch",
"specs": [
[
">=",
"2.0.0"
]
]
},
{
"name": "sentence-transformers",
"specs": [
[
">=",
"2.2.0"
]
]
},
{
"name": "transformers",
"specs": [
[
">=",
"4.30.0"
]
]
},
{
"name": "numpy",
"specs": [
[
">=",
"1.24.0"
]
]
},
{
"name": "httpx",
"specs": [
[
">=",
"0.25.0"
]
]
},
{
"name": "python-multipart",
"specs": []
},
{
"name": "python-dotenv",
"specs": []
},
{
"name": "structlog",
"specs": []
},
{
"name": "prometheus-client",
"specs": []
},
{
"name": "psutil",
"specs": [
[
">=",
"5.9.0"
]
]
},
{
"name": "mlx",
"specs": [
[
">=",
"0.4.0"
]
]
},
{
"name": "mlx-lm",
"specs": [
[
">=",
"0.2.0"
]
]
},
{
"name": "pytest",
"specs": [
[
">=",
"7.4.0"
]
]
},
{
"name": "pytest-asyncio",
"specs": [
[
">=",
"0.21.0"
]
]
},
{
"name": "black",
"specs": [
[
">=",
"23.0.0"
]
]
},
{
"name": "flake8",
"specs": [
[
">=",
"6.0.0"
]
]
},
{
"name": "mypy",
"specs": [
[
">=",
"1.5.0"
]
]
}
],
"lcname": "embed-rerank"
}